Fwd: Hurd shutdown problems

2016-08-08 Thread Brent W. Baccala
Further progress trying to track this down:

I don't have to shutdown the system to have problems.  "swapoff /dev/hd0s5"
is enough to cause problems, once enough swap is in use.  After a failed
swapoff, I have an extra 98 storeio processes running!

I don't have to swapoff to have "symptoms".  The kernel debugger normally
shows symbolic names, i.e:

Stopped  at  machine_idle+0xe:   leave
machine_idle(0,81a2c630,3806f64,0,9b448b38)+0xe
idle_thread_continue(9fcbdde0,81028b50,9c0c7fe4,0,9c3d5548)+0x2a

Once I've got enough swap in use, though, it stops doing this.  Now I see:

Stopped   at  0x81be: leave
0x81be(0,0,9fcc5990,0,9fb90b30)
0x810293fa(9fcbdde0,81028b50,99526fe4,0,9c3d5548)

When I see a kernel page fault, it's always in strcmp()

It doesn't matter if an ssh session is open or not (Riccardo Mottola's
suggestion).

I can't task_terminate the auth server, as this typically does nothing once
I've started having symptoms, but I can kill the auth server from the
command line (just "kill 7") and that triggers a reboot that leaves the
disk in a clean state.

I'm just learning Hurd.  Any ideas?

agape
brent


Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Samuel Thibault
Hello,

Christian Seiler, on Fri 05 Aug 2016 21:09:21 +0200, wrote:
> I've attached a patch that fixes this specific issue for me. I
> probably won't have time to look at the other issue I reported
> here, but with that I'd at least be able to have open-isns
> working on Hurd. (And the patch will likely also fix problems
> in other software.)
> 
> It would be great if you could apply that patch in git.

Applied, thanks!

Samuel



Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Justus Winter
Richard Braun  writes:

> On Mon, Aug 08, 2016 at 04:54:47PM +0200, Justus Winter wrote:
>> Richard Braun  writes:
>> > Why not start the translator from the remapped environment too ?
>> 
>> No reason, but this has to be implemented.  I started working on a
>> library for writing such chrooting translators, then got side-tracked by
>> the complexity of the dir_lookup operations.  Currently, remap has a
>> very naive lookup function, fakeroot's is better, but still not
>> sufficient.  I made some patches towards unifying and refactoring the
>> logic used in libdiskfs and libnetfs, but these functions are still huge
>> :/
>
> No, i mean, here, in such a specific case, if the parent translator is
> itself running from the remap env, it should used the custom pflocal
> instance, right ?

No, that doesn't help, because binding a unix socket involves setting a
passive translator, and that is still started by the filesystem
"outside" the chrooted environment:

teythoon@hurdbox /tmp % touch 1
teythoon@hurdbox /tmp % remap /servers/socket/1 /tmp/1 -- /bin/bash
bash: cannot make pipe for command substitution: (ipc/mig) bad request message 
ID
teythoon@hurdbox:/tmp$ exit
/bin/settrans: fsys_goaway: (ipc/mig) server died

(eh, also it is tricky to set up, cannot use bash right away)

teythoon@hurdbox /tmp % remap /servers/socket/1 /tmp/1 -- /bin/sh
$ settrans -a 1 /hurd/pflocal
teythoon@hurdbox:/tmp$ python3
Python 3.5.2+ (default, Aug  5 2016, 08:07:14) 
[GCC 6.1.1 20160705] on gnu0
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
>>> s.bind('/tmp/test.sock')
Traceback (most recent call last):
  File "", line 1, in 
OSError: [Errno 1073741873] Cannot assign requested address
>>> 
teythoon@hurdbox:/tmp$ showtrans test.sock
/hurd/ifsock

I firmly believe that the way to proceed is to teach such chrooting
translators to detect that a node has a passive translator record, and
instead of letting the filesystem start it, it must start the translator
on its own.  Not only gives this much stronger isolation, it is also
necessary for correctness.

Justus


signature.asc
Description: PGP signature


Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Richard Braun
On Mon, Aug 08, 2016 at 04:54:47PM +0200, Justus Winter wrote:
> Richard Braun  writes:
> > Why not start the translator from the remapped environment too ?
> 
> No reason, but this has to be implemented.  I started working on a
> library for writing such chrooting translators, then got side-tracked by
> the complexity of the dir_lookup operations.  Currently, remap has a
> very naive lookup function, fakeroot's is better, but still not
> sufficient.  I made some patches towards unifying and refactoring the
> logic used in libdiskfs and libnetfs, but these functions are still huge
> :/

No, i mean, here, in such a specific case, if the parent translator is
itself running from the remap env, it should used the custom pflocal
instance, right ?

-- 
Richard Braun



Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Justus Winter
Richard Braun  writes:

> On Mon, Aug 08, 2016 at 12:55:24PM +0200, Justus Winter wrote:
>> Right, I can see how this is a problem.  The thing is, remap doesn't
>> quite do the job: 1/ it fails to remap relative paths, 2/ if one sets a
>> translator record on a node, and that translator is then started by the
>> filesystem, it is started "outside" of the remap environment.  I belive
>> 2/ is what happens here.
>
> Why not start the translator from the remapped environment too ?

No reason, but this has to be implemented.  I started working on a
library for writing such chrooting translators, then got side-tracked by
the complexity of the dir_lookup operations.  Currently, remap has a
very naive lookup function, fakeroot's is better, but still not
sufficient.  I made some patches towards unifying and refactoring the
logic used in libdiskfs and libnetfs, but these functions are still huge
:/

Justus


signature.asc
Description: PGP signature


Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Richard Braun
On Mon, Aug 08, 2016 at 12:55:24PM +0200, Justus Winter wrote:
> Right, I can see how this is a problem.  The thing is, remap doesn't
> quite do the job: 1/ it fails to remap relative paths, 2/ if one sets a
> translator record on a node, and that translator is then started by the
> filesystem, it is started "outside" of the remap environment.  I belive
> 2/ is what happens here.

Why not start the translator from the remapped environment too ?

-- 
Richard Braun



Re: Hurd shutdown problems

2016-08-08 Thread Riccardo Mottola

Hi,

Justus Winter wrote:

>Have you tried using halt-hurd instead of shutdown? As far as I can
>remember, halt-hurd has never caused file system corruption for me,
>but I'm pretty sure shutdown did way back when I was still trying
>to use it.

That is correct.  halt-hurd is basically halt -f, which is safe on the
Hurd, but skips the sysvinit shutdown.  However, we need to figure out
why this hangs every now and then.


in my personal experience, I had "hangs" when I had a telnet session 
open (I think also ssh.. I shall try again).

Usually all connected clients should get disconnected.
If I power on hurd and then login from consoe and shut it down, it works 
reliably.


Riccardo



Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Justus Winter
Christian Seiler  writes:

 Use the remap translator instead, which is one of the things the Hurd
 design allows you to do easily.

 See /bin/remap to easily set one.
>>>
>>> remap doesn't work at all here, programs then complain
>>> that they can't assign requested address when doing any
>>> socket operation.
>> 
>> Seems to work fine here:
>> 
>> teythoon@hurdbox ~ % cd /tmp
>> teythoon@hurdbox /tmp % settrans -ac 1 /hurd/pflocal
>> teythoon@hurdbox /tmp % remap /servers/socket/1 /tmp/1 -- /bin/bash -c 'echo 
>> huhu world | wc'
>>   1   2  11
>
> For pipes yes, for named sockets (which is what open-isns
> uses): no.
>
> $ cd /tmp
> $ settrans -ac 1 /hurd/pflocal
> $ remap /servers/socket/1 /tmp/1 -- python3
> Python 3.5.2+ (default, Aug  5 2016, 08:07:14) 
> [GCC 6.1.1 20160705] on gnu0
> Type "help", "copyright", "credits" or "license" for more information.
 import socket
 s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
 s.bind('/tmp/test.sock')
> Traceback (most recent call last):
>   File "", line 1, in 
> OSError: [Errno 1073741873] Cannot assign requested address
>
> (Same also from C programs, Python is just easier to test.)
>
> The same python code works if you run it without remap.

Right, I can see how this is a problem.  The thing is, remap doesn't
quite do the job: 1/ it fails to remap relative paths, 2/ if one sets a
translator record on a node, and that translator is then started by the
filesystem, it is started "outside" of the remap environment.  I belive
2/ is what happens here.

fakeroot has the same problem.  For me, lack of robust lightweight
virtualization this is the most pressing shortcoming of the Hurd, and I
did some work to address this.  Aiui remap/fakeroot must prevent the
filesystem from starting the translator and do it themself to make the
translation more correct.

> Anyway, not terribly important to me, rebooting did work fine
> anyway, and I now have a working patch for open-isns that will
> make it work on Hurd once my other patch against pflocal's
> socket.c is merged.

Cool!

Cheers,
Justus


signature.asc
Description: PGP signature


Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Christian Seiler
On 08/08/2016 12:18 PM, Justus Winter wrote:
>> [settrans -ck stuff]
> All in all this was just bad advice.

Ok, good to know. :)

>>> Use the remap translator instead, which is one of the things the Hurd
>>> design allows you to do easily.
>>>
>>> See /bin/remap to easily set one.
>>
>> remap doesn't work at all here, programs then complain
>> that they can't assign requested address when doing any
>> socket operation.
> 
> Seems to work fine here:
> 
> teythoon@hurdbox ~ % cd /tmp
> teythoon@hurdbox /tmp % settrans -ac 1 /hurd/pflocal
> teythoon@hurdbox /tmp % remap /servers/socket/1 /tmp/1 -- /bin/bash -c 'echo 
> huhu world | wc'
>   1   2  11

For pipes yes, for named sockets (which is what open-isns
uses): no.

$ cd /tmp
$ settrans -ac 1 /hurd/pflocal
$ remap /servers/socket/1 /tmp/1 -- python3
Python 3.5.2+ (default, Aug  5 2016, 08:07:14) 
[GCC 6.1.1 20160705] on gnu0
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
>>> s.bind('/tmp/test.sock')
Traceback (most recent call last):
  File "", line 1, in 
OSError: [Errno 1073741873] Cannot assign requested address

(Same also from C programs, Python is just easier to test.)

The same python code works if you run it without remap.

Anyway, not terribly important to me, rebooting did work fine
anyway, and I now have a working patch for open-isns that will
make it work on Hurd once my other patch against pflocal's
socket.c is merged.

Thanks,
Christian



Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Justus Winter
Christian Seiler  writes:

> (The following is not really important, rebooting does
> work, so it's not a showstopper.)
>
> On 08/07/2016 09:13 PM, Richard Braun wrote:
>> On Sun, Aug 07, 2016 at 08:44:56PM +0300, Esa Peuha wrote:
 PS: Is there any way to sanely restart /hurd/pflocal without
 rebooting?
>>>
>>> Yes, the commands to do that are
>>>
>>> settrans -ck /servers/socket/1
>>> settrans -ck /servers/socket/1 /hurd/pflocal
>
> FYI: that's really weird: the translater appears to be
> replaced on my system (up to date Debian sid), but from
> the response of programs, the old one still appears to
> be used.

Yes, that's what the -k is for, it keeps the old translator running.
Also, without specifying -a, settrans only stores the translator record,
which does not change.  -c creates the node, which already exists.  All
in all this was just bad advice.

>> Use the remap translator instead, which is one of the things the Hurd
>> design allows you to do easily.
>> 
>> See /bin/remap to easily set one.
>
> remap doesn't work at all here, programs then complain
> that they can't assign requested address when doing any
> socket operation.

Seems to work fine here:

teythoon@hurdbox ~ % cd /tmp
teythoon@hurdbox /tmp % settrans -ac 1 /hurd/pflocal
teythoon@hurdbox /tmp % remap /servers/socket/1 /tmp/1 -- /bin/bash -c 'echo 
huhu world | wc'
  1   2  11


Cheers,
Justus


signature.asc
Description: PGP signature


Re: [PATCH] [hurd] pflocal/socket.c: Support MSG_DONTWAIT in pflocal send/recv

2016-08-08 Thread Christian Seiler
(The following is not really important, rebooting does
work, so it's not a showstopper.)

On 08/07/2016 09:13 PM, Richard Braun wrote:
> On Sun, Aug 07, 2016 at 08:44:56PM +0300, Esa Peuha wrote:
>>> PS: Is there any way to sanely restart /hurd/pflocal without
>>> rebooting?
>>
>> Yes, the commands to do that are
>>
>> settrans -ck /servers/socket/1
>> settrans -ck /servers/socket/1 /hurd/pflocal

FYI: that's really weird: the translater appears to be
replaced on my system (up to date Debian sid), but from
the response of programs, the old one still appears to
be used.

> Use the remap translator instead, which is one of the things the Hurd
> design allows you to do easily.
> 
> See /bin/remap to easily set one.

remap doesn't work at all here, programs then complain
that they can't assign requested address when doing any
socket operation.

Regards,
Christian



Re: Hurd shutdown problems

2016-08-08 Thread Justus Winter
"Brent W. Baccala"  writes:

> On Sat, Aug 6, 2016 at 7:59 AM, Justus Winter  wrote:
>
>>
>> To prevent filesystem damage, try the following.  Break into the kernel
>> debugger, and kill the auth server using:
>>
>> !task_terminate($task5)
>>
>> Then continue using "c", and /hurd/startup should cleanly shutdown the
>> system.
>>
>>
> The problem seems to be caused by a failure to swapoff the swap space.
> Since I've started paying attention to the swap space usage, I've always
> been able to cleanly shutdown if no swap is in use.  Once, when a small
> amount of swap was in use (7 MB), I was able to shutdown cleanly.  After a
> decent sized compile, however, with 100 MB or so of swap in use, I always
> get this:
>
> Deactivating swap...swapoff: /dev/hd0s5: 177152k swap space
> swapoff: /dev/hd0s5: (os/kern) failure
> failed.
> Unmounting weak filesystems...umount: /etc/mtab: Warning: duplicate entry
> for device /dev/hd0s1 (/dev/cons)
> done.
> mount: cannot remount /: Device or resource busy
> Will now halt.
>
> Now everything stops.

Interesting.  There is a utility in the Hurd tree called 'vmallocate'
that can be used to allocate and dirty large amounts of memory to
trigger such issues.  Unfortunately it isn't shipped with Debian iirc.

> What happens if I now try Justus's advice?
>
> Stoppedat  0x81be:leave
> Kernel Page fault trap, eip 0x81029b4e
> Caught Page fault (14),code = 0, pc = 81029b4e

Well, your system seems to be in a bad shape when entering the debugger,
a kernel fault occurred.  You cannot reasonably expect anything at this
point.

But yes, it fails from time to time, usually when it fails I see the
kernel rebooting as soon as I call the task_terminate function.  I guess
it is because one can break into the debugger when the system is at an
inconsistent state by chance.

Cheers,
Justus


signature.asc
Description: PGP signature


Re: Hurd shutdown problems

2016-08-08 Thread Brent W. Baccala
On Sat, Aug 6, 2016 at 7:59 AM, Justus Winter  wrote:

>
> To prevent filesystem damage, try the following.  Break into the kernel
> debugger, and kill the auth server using:
>
> !task_terminate($task5)
>
> Then continue using "c", and /hurd/startup should cleanly shutdown the
> system.
>
>
The problem seems to be caused by a failure to swapoff the swap space.
Since I've started paying attention to the swap space usage, I've always
been able to cleanly shutdown if no swap is in use.  Once, when a small
amount of swap was in use (7 MB), I was able to shutdown cleanly.  After a
decent sized compile, however, with 100 MB or so of swap in use, I always
get this:

Deactivating swap...swapoff: /dev/hd0s5: 177152k swap space
swapoff: /dev/hd0s5: (os/kern) failure
failed.
Unmounting weak filesystems...umount: /etc/mtab: Warning: duplicate entry
for device /dev/hd0s1 (/dev/cons)
done.
mount: cannot remount /: Device or resource busy
Will now halt.

Now everything stops.  What happens if I now try Justus's advice?

Stoppedat  0x81be:leave
Kernel Page fault trap, eip 0x81029b4e
Caught Page fault (14),code = 0, pc = 81029b4e
db> !task_terminate($task5)
Kernel Page fault trap, eip 0x81029b4e
Caught Page fault (14),code = 0, pc = 81029b4e
db> c

...and nothing.  Break back into the debugger and nothing has changed.
"show all tasks" still shows /hurd/auth running as ID 5.

agape
brent