Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-20 Thread James Clarke
I have now got a patch that drains down the queue, and it does successfully 
stop sblock from being dereferenced etc when we actually reload. However, 
sometimes thread 5 (the same one that would dereference sblock) seems to get 
stuck in vm_fault_continue (at least according to the kernel debugger), so I 
need to do some more debugging to see why.

James

 On 19 Jul 2015, at 15:00, Richard Braun rbr...@sceen.net wrote:
 
 On Sun, Jul 19, 2015 at 02:25:14PM +0100, James Clarke wrote:
 Yeah, I tried inhibiting both buckets, but the paging RPCs still got 
 through, so my guess was that libports's inhibit/resume methods weren't able 
 to deal with libpager's own threads. The thing is I don't think we currently 
 keep track of any reference to the main/worker threads, as 
 pager_start_workers just takes a bucket and returns void. Is there a way we 
 can instead make the main thread and/or workers able to block 
 ports_inhibit_X_rpcs like normal RPC handlers and be cancelled etc? If 
 possible I think that would be a cleaner solution.
 
 To continue our discussion on IRC:
 
 No, it would definitely not be a cleaner solution, just an ugly hack.
 Since paging doesn't occur as part of an RPC, you just can't use RPC
 stuff to manage it. I suggest building rwlock-based synrchonization
 functions specific to the pager workers.
 
 -- 
 Richard Braun



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-19 Thread Justus Winter
Hello James :)

Quoting James Clarke (2015-07-15 22:20:57)
 I had a look today at what's happening, and it's that the *file*
 pager is trying to read from disk. Any thoughts?

There is another thing I forgot.  libpager is special, it has its own
demuxer (see libpager/demuxer.c) that writes requests into a queue,
and a pool of workers that process requests from said queue.

The thing is, when we inhibit the pager RPCs, we merely prevent new
ones from being enqueued, but we don't prevent the workers from
processing already enqueued requests.  So we indeed need to add
functions to inhibit and restart paging to libpager that know about
the queue.

Justus



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-19 Thread James Clarke
Yeah, I tried inhibiting both buckets, but the paging RPCs still got through, 
so my guess was that libports's inhibit/resume methods weren't able to deal 
with libpager's own threads. The thing is I don't think we currently keep track 
of any reference to the main/worker threads, as pager_start_workers just takes 
a bucket and returns void. Is there a way we can instead make the main thread 
and/or workers able to block ports_inhibit_X_rpcs like normal RPC handlers and 
be cancelled etc? If possible I think that would be a cleaner solution.

James

 On 19 Jul 2015, at 13:50, Justus Winter 4win...@informatik.uni-hamburg.de 
 wrote:
 
 Hello James :)
 
 Quoting James Clarke (2015-07-15 22:20:57)
 I had a look today at what's happening, and it's that the *file*
 pager is trying to read from disk. Any thoughts?
 
 There is another thing I forgot.  libpager is special, it has its own
 demuxer (see libpager/demuxer.c) that writes requests into a queue,
 and a pool of workers that process requests from said queue.
 
 The thing is, when we inhibit the pager RPCs, we merely prevent new
 ones from being enqueued, but we don't prevent the workers from
 processing already enqueued requests.  So we indeed need to add
 functions to inhibit and restart paging to libpager that know about
 the queue.
 
 Justus



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-19 Thread Richard Braun
On Sun, Jul 19, 2015 at 02:25:14PM +0100, James Clarke wrote:
 Yeah, I tried inhibiting both buckets, but the paging RPCs still got through, 
 so my guess was that libports's inhibit/resume methods weren't able to deal 
 with libpager's own threads. The thing is I don't think we currently keep 
 track of any reference to the main/worker threads, as pager_start_workers 
 just takes a bucket and returns void. Is there a way we can instead make the 
 main thread and/or workers able to block ports_inhibit_X_rpcs like normal RPC 
 handlers and be cancelled etc? If possible I think that would be a cleaner 
 solution.

To continue our discussion on IRC:

No, it would definitely not be a cleaner solution, just an ugly hack.
Since paging doesn't occur as part of an RPC, you just can't use RPC
stuff to manage it. I suggest building rwlock-based synrchonization
functions specific to the pager workers.

-- 
Richard Braun



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-15 Thread James Clarke
As discussed in IRC, this successfully stopped the disk pager from 
dereferencing sblock. However, it was still hanging at boot a lot of the time 
(seemingly if and only if I booted in normal mode ie not recovery mode, but 
that's probably just a timing thing).

I had a look today at what's happening, and it's that the *file* pager is 
trying to read from disk. Any thoughts?

James

 On 14 Jul 2015, at 20:54, Justus Winter 4win...@informatik.uni-hamburg.de 
 wrote:
 
 Hi James :)
 
 you found a long-standing bug in ext2fs.  Fixing it allows us to get
 rid of the ugly workaround in daemons/runsystem.sh (look for `XXX').
 
 Quoting Richard Braun (2015-07-13 10:16:14)
 On Sun, Jul 12, 2015 at 12:56:31PM +0100, James Clarke wrote:
 That doesn’t seem to boot at all. I had tried changing it to inhibiting all 
 RPCs (it looks like you’ve inhibited an extra class?), but it seems that 
 paging is needed? Perhaps part of ext2fs gets paged out, and it needs to be 
 paged in when remounting?
 
 Remounting can require paging out, yes.
 
 See diskfs_reload_global_state in ext2fs :
 
 diskfs_reload_global_state ()
 {
  pokel_flush (global_pokel);
  pager_flush (diskfs_disk_pager, 1);
 
 So I guess we need to inhibit the RPCs here, not before calling
 diskfs_reload_global_state, then do:
 
  get_hypermetadata ();
  map_hypermetadata ();
 
 And reenable them here.
 
  return 0;
 }
 
 I guess that means changing the diskfs API.  James, do you want to
 give it a shot?
 
 In the mean time, enjoy my hacky workaround:
 http://nonmonolithic.org/ext2fs.static
 
 Cheers,
 Justus



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-14 Thread Justus Winter
Quoting Richard Braun (2015-07-13 10:16:14)
 On Sun, Jul 12, 2015 at 12:56:31PM +0100, James Clarke wrote:
  That doesn’t seem to boot at all. I had tried changing it to inhibiting all 
  RPCs (it looks like you’ve inhibited an extra class?), but it seems that 
  paging is needed? Perhaps part of ext2fs gets paged out, and it needs to be 
  paged in when remounting?
 
 Remounting can require paging out, yes.
 
 See diskfs_reload_global_state in ext2fs :
 
 diskfs_reload_global_state ()
 {
   pokel_flush (global_pokel);
   pager_flush (diskfs_disk_pager, 1);
 ...

Aha, but this is the disk pager, not the file pager which needs
sblock.

Justus



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-14 Thread Justus Winter
Hi James :)

you found a long-standing bug in ext2fs.  Fixing it allows us to get
rid of the ugly workaround in daemons/runsystem.sh (look for `XXX').

Quoting Richard Braun (2015-07-13 10:16:14)
 On Sun, Jul 12, 2015 at 12:56:31PM +0100, James Clarke wrote:
  That doesn’t seem to boot at all. I had tried changing it to inhibiting all 
  RPCs (it looks like you’ve inhibited an extra class?), but it seems that 
  paging is needed? Perhaps part of ext2fs gets paged out, and it needs to be 
  paged in when remounting?
 
 Remounting can require paging out, yes.
 
 See diskfs_reload_global_state in ext2fs :
 
 diskfs_reload_global_state ()
 {
   pokel_flush (global_pokel);
   pager_flush (diskfs_disk_pager, 1);

So I guess we need to inhibit the RPCs here, not before calling
diskfs_reload_global_state, then do:

  get_hypermetadata ();
  map_hypermetadata ();

And reenable them here.

  return 0;
}

I guess that means changing the diskfs API.  James, do you want to
give it a shot?

In the mean time, enjoy my hacky workaround:
http://nonmonolithic.org/ext2fs.static

Cheers,
Justus



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-13 Thread Richard Braun
On Sun, Jul 12, 2015 at 12:56:31PM +0100, James Clarke wrote:
 That doesn’t seem to boot at all. I had tried changing it to inhibiting all 
 RPCs (it looks like you’ve inhibited an extra class?), but it seems that 
 paging is needed? Perhaps part of ext2fs gets paged out, and it needs to be 
 paged in when remounting?

Remounting can require paging out, yes.

See diskfs_reload_global_state in ext2fs :

diskfs_reload_global_state ()
{
  pokel_flush (global_pokel);
  pager_flush (diskfs_disk_pager, 1);
...

-- 
Richard Braun



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-12 Thread James Clarke
That doesn’t seem to boot at all. I had tried changing it to inhibiting all 
RPCs (it looks like you’ve inhibited an extra class?), but it seems that paging 
is needed? Perhaps part of ext2fs gets paged out, and it needs to be paged in 
when remounting?

James

 On 12 Jul 2015, at 00:27, Justus Winter 4win...@informatik.uni-hamburg.de 
 wrote:
 
 Quoting James Clarke (2015-07-11 22:33:44)
 I did some more digging around today. I think what’s happening is
 that ext2fs tries to handle a pager RPC while the disk is being
 remounted
 
 Sounds plausible.  Could you try:
 
 http://darnassus.sceen.net/~teythoon/ext2fs.static
 
 I'll send the patch as follow-up.
 
 Justus




Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-12 Thread Justus Winter
Quoting James Clarke (2015-07-12 13:56:31)
 That doesn’t seem to boot at all.

Indeed :/

db show all tasks
ID TASK NAME [THREADS]
0 f2745f00 gnumach [8]
1 f2745e40 ext2fs [12]
2 f2745d80 exec [5]
3 f2745cc0 (ext2fs) [1]
4 f2745c00 /hurd/proc [4]
5 f2745b40 /hurd/auth [5]
6 f2745a80 /bin/sh(1) [2]
7 f27459c0 /hurd/term(8) [5]
8 f2745900 /hurd/pflocal(9) [7]
9 f2745780 (/hurd/mach-defpager(10)) [6]
10 f2745840 fsysopts(13) [2]

 I had tried changing it to inhibiting all RPCs (it looks like you’ve
 inhibited an extra class?), but it seems that paging is needed?
 Perhaps part of ext2fs gets paged out, and it needs to be paged in
 when remounting?

Perhaps.

Justus



Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-11 Thread James Clarke
I did some more digging around today. I think what’s happening is that ext2fs 
tries to handle a pager RPC while the disk is being remounted.

We do call ports_inhibit_class_rpcs, which will wait until all RPCs for that 
class have finished. However, we call this with diskfs_protoid_class, which 
does *not* include the pager ports. These are added to _pager_class 
(libpager/priv.h) in pager_create (libpager/pager-create.c:32) and 
disk_pager_bucket (ext2fs/pager.c) in create_disk_pager (ext2fs/pager.c), and 
so as a result I believe we can get pager RPCs while remounting, leading to the 
call to ext2_getblk. Below is the stack for the call to ext2_getblk that leads 
to dereferencing sblock when it is NULL:

 0  ext2fs/getblk.c:253 (ext2_getblk)
 1  ext2fs/pager.c:147 (find_block)
 2  ext2fs/pager.c:244 (file_pager_read_page)
 3  ext2fs/pager.c:550 (pager_read_page)
 4  libpager/data-request.c:113 (_pager_S_memory_object_data_request)
 5  libpager/memory_objectServer.c:443 (_Xmemory_object_data_request)
 6  libpager/demuxer.c:215 (worker_func)
 7  libpthread/pthread/pt-create.c:64 (entry_point)

James Clarke

 On 27 Jun 2015, at 20:34, Richard Braun rbr...@sceen.net wrote:
 
 On Sat, Jun 27, 2015 at 03:39:58PM +0100, James Clarke wrote:
 I have been suffering a lot from my Hurd system (running in VirtualBox) 
 hanging at startup, just after Hurd server bootstrap... but before INIT: 
 version 2.88 booting.
 
 I have been able to trace it back to getblk.c:248 (unsigned long 
 addr_per_block = EXT2_ADDR_PER_BLOCK (sblock);) in ext2_getblk. It faults 
 because sblock is NULL.
 
 I have traced the execution with debugging statements, and what seems to 
 happen is as follows:
 
 1. diskfs_remount is called (because root is remounted as rw)
 2. RPCs are inhibited
 3. diskfs_reload_global_state is called
 4. sblock is set to NULL
 5. While this is happening, ext2_getblk is called
 
 If you’re lucky, the superblock is read and sblock is set to point to this 
 data before 5 (or at least before it gets to dereferencing sblock). If not, 
 sblock is still NULL and thus a page fault is raised, causing the system to 
 be stuck.
 
 Does anyone have an idea how this situation could be occurring?
 
 My initial thought would be how could it not happen ?.
 
 Despite diskfs_remount calling ports_inhibit_class_rpcs, other threads
 can very well be running to process previously received messages. There
 seems to be no other form of access synchronization such as locks in
 diskfs_reload_global_state.
 
 Can you get the call trace leading to ext2_getblk ? I'm not sure about
 backtrace(3) in static executables but it might be worth trying.
 
 -- 
 Richard Braun




Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-07-11 Thread Justus Winter
Quoting James Clarke (2015-07-11 22:33:44)
 I did some more digging around today. I think what’s happening is
 that ext2fs tries to handle a pager RPC while the disk is being
 remounted

Sounds plausible.  Could you try:

http://darnassus.sceen.net/~teythoon/ext2fs.static

I'll send the patch as follow-up.

Justus



VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-06-27 Thread James Clarke
Hi,
I have been suffering a lot from my Hurd system (running in VirtualBox) hanging 
at startup, just after Hurd server bootstrap... but before INIT: version 
2.88 booting.

I have been able to trace it back to getblk.c:248 (unsigned long addr_per_block 
= EXT2_ADDR_PER_BLOCK (sblock);) in ext2_getblk. It faults because sblock is 
NULL.

I have traced the execution with debugging statements, and what seems to happen 
is as follows:

1. diskfs_remount is called (because root is remounted as rw)
2. RPCs are inhibited
3. diskfs_reload_global_state is called
4. sblock is set to NULL
5. While this is happening, ext2_getblk is called

If you’re lucky, the superblock is read and sblock is set to point to this data 
before 5 (or at least before it gets to dereferencing sblock). If not, sblock 
is still NULL and thus a page fault is raised, causing the system to be stuck.

Does anyone have an idea how this situation could be occurring?

James Clarke




Re: VirtualBox Hangs Pre-Init Due To Ext2FS Fault

2015-06-27 Thread Richard Braun
On Sat, Jun 27, 2015 at 03:39:58PM +0100, James Clarke wrote:
 I have been suffering a lot from my Hurd system (running in VirtualBox) 
 hanging at startup, just after Hurd server bootstrap... but before INIT: 
 version 2.88 booting.
 
 I have been able to trace it back to getblk.c:248 (unsigned long 
 addr_per_block = EXT2_ADDR_PER_BLOCK (sblock);) in ext2_getblk. It faults 
 because sblock is NULL.
 
 I have traced the execution with debugging statements, and what seems to 
 happen is as follows:
 
 1. diskfs_remount is called (because root is remounted as rw)
 2. RPCs are inhibited
 3. diskfs_reload_global_state is called
 4. sblock is set to NULL
 5. While this is happening, ext2_getblk is called
 
 If you’re lucky, the superblock is read and sblock is set to point to this 
 data before 5 (or at least before it gets to dereferencing sblock). If not, 
 sblock is still NULL and thus a page fault is raised, causing the system to 
 be stuck.
 
 Does anyone have an idea how this situation could be occurring?

My initial thought would be how could it not happen ?.

Despite diskfs_remount calling ports_inhibit_class_rpcs, other threads
can very well be running to process previously received messages. There
seems to be no other form of access synchronization such as locks in
diskfs_reload_global_state.

Can you get the call trace leading to ext2_getblk ? I'm not sure about
backtrace(3) in static executables but it might be worth trying.

-- 
Richard Braun