[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-12-10 Thread Christopher Head
https://bugs.kde.org/show_bug.cgi?id=384732

Christopher Head  changed:

   What|Removed |Added

 CC||kdeb...@chead.ca

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-11-26 Thread Daniel
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #14 from Daniel  ---
Created attachment 109075
  --> https://bugs.kde.org/attachment.cgi?id=109075=edit
Patch with own replacement function for the provided test code

This patch is a diff of the provided test code and a new test code which
includes an own posix_spawnp replacement function of glibc 2.24

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-11-26 Thread Daniel
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #13 from Daniel  ---
Created attachment 109074
  --> https://bugs.kde.org/attachment.cgi?id=109074=edit
Try to add a replacement function of posix_spawnp to vg_preloaded.c

The patch is a diff of the current vg_preloaded.c git code.
It was a quick try, not very nice.

It compiled but didn't work, although I didn't try to find out why the
replacement function wasn't called.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-11-25 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #12 from Philippe Waroquiers  ---
(In reply to Daniel from comment #11)
> (In reply to Philippe Waroquiers from comment #10)
> > What you might do is to write a replacement procedure for
> > posix_spawn/posis_spawnp, using the technique/code of glibc 2.24.
> 
> Good suggestion! I'm not familiar with the valgrind code, but I tried adding
> a replacement function to vg_preloaded.c
> It compiled but as I ran my binary, the replacement function was not used.
> 
> I then added the glibc 2.24 code to my own program and replaced my
> posix_spawnp call with my own glibc implementation and this works.
> I just run this implementation if valgrind is running, otherwise the
> standard posix_spawnp function.
> 
> So I can live with the issue now, but of course a real fix in valgrind would
> be perferable.
> 
> Thanks for all your work on this great tool!

Thanks for the feedback.

Can you attach to this bug the 2 different patches (the 'in application' patch,
and the 'in valgrind' patch) ?

Others might use the first as a temporary bypass, and we could see why the
second one is not working, fix and integrate it;

Thanks

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-11-24 Thread Daniel
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #11 from Daniel  ---
(In reply to Philippe Waroquiers from comment #10)
> What you might do is to write a replacement procedure for
> posix_spawn/posis_spawnp, using the technique/code of glibc 2.24.

Good suggestion! I'm not familiar with the valgrind code, but I tried adding a
replacement function to vg_preloaded.c
It compiled but as I ran my binary, the replacement function was not used.

I then added the glibc 2.24 code to my own program and replaced my posix_spawnp
call with my own glibc implementation and this works.
I just run this implementation if valgrind is running, otherwise the standard
posix_spawnp function.

So I can live with the issue now, but of course a real fix in valgrind would be
perferable.

Thanks for all your work on this great tool!

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-11-21 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #10 from Philippe Waroquiers  ---
(In reply to Daniel from comment #9)
> Do you have any idea if this will/can be fixed?
Nobody seems to work on this for the moment, so time to fix is unknown.
It looks like the solaris specific code should/could be made more
general, so as to work on linux also. As I understand, the solaris code
is intercepting a subset of the (solaris) libc functions in the child,
so as to replace the direct assignment of the child status code in the parent
address space by a procedure that gives back the child code to the parent
via a pipe. I am wondering if it would not be simpler to just replace fully
the posix_spawn in vg_preload, as the solaris solution seems quite dependent
on internal implementation of libc (e.g. the replacement of get_error and
set_error).


> Or do I have any possibility to work around this issue in my code?
What you might do is to write a replacement procedure for
posix_spawn/posis_spawnp, using the technique/code of glibc 2.24.
See
http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.wrapping

(of course, do not call the original posix_spawn, just implement it fully
in the wrapping functionà.

Basically, this suggestion consists in implementing the alternative solution
discussed above, but outside of valgrind.
So, you might maybe envisage to just write it inside vg_prelaoded.c 
and provide a patch, if ever what I suggest makes sense :)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-11-21 Thread Daniel
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #9 from Daniel  ---
Do you have any idea if this will/can be fixed?
Or do I have any possibility to work around this issue in my code?

Greetings,
Daniel

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-19 Thread Ivo Raisr
https://bugs.kde.org/show_bug.cgi?id=384732

Ivo Raisr  changed:

   What|Removed |Added

 CC||iv...@ivosh.net

--- Comment #8 from Ivo Raisr  ---
Solaris port had to support "proper" vfork semantics right from the beginning
because Solaris libc relies on its proper functioning.

So there are some clever workarounds how to "preserve" vfork semantics but
translate it to fork() at the same time.
Have a look at coregrind/m_syswrap/syswrap-solaris.c and
coregrind/vg_preloaded.c, search there for "vfork".
Apart from other stuff which goes on there, a pipe is created to communicate
error code from child to parent. This in fact emulates shared address space.

If I understand it correctly, pipe was used previously for the very same reason
in glibc <= 2.24. Perhaps this mechanism can be leveraged in Valgrind for glibc
>= 2.25?

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-17 Thread Philippe Waroquiers
https://bugs.kde.org/show_bug.cgi?id=384732

Philippe Waroquiers  changed:

   What|Removed |Added

 CC||philippe.waroquiers@skynet.
   ||be

--- Comment #7 from Philippe Waroquiers  ---
(In reply to Mark Wielaard from comment #6)
> (In reply to Tom Hughes from comment #5)
> > We already detect CLONE_VFORK|CLONE_VM as a special case and deliberately
> > drop CLONE_VM because I imagine anything else would be a disaster as
> > valgrind's shadow memory etc would be shared.
> 
> Sharing shadow memory should be fine since we do share the actual memory.
> But sharing the thread state is probably a problem.
> 
> And indeed just not dropping VM_CLONE when we see VM_VFORK doesn't work.
> It does successfully create the child and execs the new process, but
> returning to the parent seg faults fairly soon.
Yes, that looks expected. For example, if the scheduler in the child process
has to regain control e.g. to translate a block in what follows the clone
syscall, then the parent host stack will be thrashed by the code executed by
the host child.
In other words, i highly suspect that the constraints of vfork syscall
apply for VM_CLONE+VM_VFORK (i.e. no function call, no return, ...).
What is tricky is that such constraints have to be respected both for
the guest and the host parts of the child;

> 
> > The comments in the source explicitly mention preserving CLONE_VFORK to
> > preserve the wait-for-exec semantics and that does indeed seem to work.
> 
> Earlier we did drop both CLONE_VFORK and CLONE_VM.
> And I can definitely see why CLONE_VM is bad without CLONE_VFORK.
> 
> I am probable missing some bad sharing which messes things up.
> But I still wonder if we could handle CLONE_VFORK | CLONE_VM specially as if
> it was a simple thread creation (do_clone vs do_fork_clone) up till the
> actual exec/exit call (at which point we might have to do some magic to do
> the actual fork). Maybe too much bad voodoo though...
To fully respect the vfork semantic, all the threads have to be suspended
while the new thread has not yet done the exec, and nothing can happen
in these threds (e.g. signal handling or whatever).
It does look like a nice challenge to achieve.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-15 Thread Mark Wielaard
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #6 from Mark Wielaard  ---
(In reply to Tom Hughes from comment #5)
> We already detect CLONE_VFORK|CLONE_VM as a special case and deliberately
> drop CLONE_VM because I imagine anything else would be a disaster as
> valgrind's shadow memory etc would be shared.

Sharing shadow memory should be fine since we do share the actual memory.
But sharing the thread state is probably a problem.

And indeed just not dropping VM_CLONE when we see VM_VFORK doesn't work.
It does successfully create the child and execs the new process, but returning
to the parent seg faults fairly soon.

> The comments in the source explicitly mention preserving CLONE_VFORK to
> preserve the wait-for-exec semantics and that does indeed seem to work.

Earlier we did drop both CLONE_VFORK and CLONE_VM.
And I can definitely see why CLONE_VM is bad without CLONE_VFORK.

I am probable missing some bad sharing which messes things up.
But I still wonder if we could handle CLONE_VFORK | CLONE_VM specially as if it
was a simple thread creation (do_clone vs do_fork_clone) up till the actual
exec/exit call (at which point we might have to do some magic to do the actual
fork). Maybe too much bad voodoo though...

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-15 Thread Tom Hughes
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #5 from Tom Hughes  ---
We already detect CLONE_VFORK|CLONE_VM as a special case and deliberately drop
CLONE_VM because I imagine anything else would be a disaster as valgrind's
shadow memory etc would be shared.

The comments in the source explicitly mention preserving CLONE_VFORK to
preserve the wait-for-exec semantics and that does indeed seem to work.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-15 Thread Mark Wielaard
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #4 from Mark Wielaard  ---
I wonder if we can also rely on the CLONE_VFORK mechanism.

So if we see a clone with CLONE_VFORK | CLONE_VM we don't drop the CLONE_VM,
but do freeze all other threads till we see the exec/exit syscall.

Since till the exit/exec the host and child memory are the same and only that
child thread is running it should keep the valgrind datastructures sane.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-15 Thread Mark Wielaard
https://bugs.kde.org/show_bug.cgi?id=384732

Mark Wielaard  changed:

   What|Removed |Added

 CC||m...@klomp.org

--- Comment #3 from Mark Wielaard  ---
This glibc commit seems to have broken how valgrind handles posix_spawn.
It now relies on CLONE_VM semantics.

commit 4b4d4056bb154603f36c6f8845757c1012758158
Author: Rasmus Villemoes 
Date:   Tue Sep 27 17:52:17 2016 -0700

linux: spawni.c: simplify error reporting to parent

Using CLONE_VFORK already ensures that the parent does not run until the
child has either exec'ed succesfully or called _exit. Hence we don't
need to read from a CLOEXEC pipe to ensure proper synchronization - we
just make explicit use of the fact the the child and parent run in the
same VM, so the child can write an error code to a field of the
posix_spawn_args struct instead of sending it through a pipe.

To ensure that this mechanism really works, the parent initializes the
field to -1 and the child writes 0 before execing.

This eliminates some annoying bookkeeping that is necessary to avoid
the file actions from clobbering the write end of the pipe, and
getting rid of the pipe creation in the first place means fewer system
calls (four in the parent, usually one in the child) and fewer
chanches for the spawn to fail (e.g. if we're close to EMFILE).

Checked on x86_64 and i686.

* sysdeps/unix/sysv/linux/spawni.c (posix_spawn_args): Remove pipe
field, add err field.
(__spawni_child): Report error through err member instead of pipe.
(__spawnix): Likewise.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-15 Thread Tom Hughes
https://bugs.kde.org/show_bug.cgi?id=384732

Tom Hughes  changed:

   What|Removed |Added

 CC||t...@compton.nu

--- Comment #2 from Tom Hughes  ---
This is because glibc is using clone with CLONE_VFORK and CLONE_VM and is
relying on the fact the CLONE_VM means the two processes will be sharing the
same address space until the vfork completes by execing the child or exiting.

But valgrind drops the CLONE_VM flag which means they're not sharing the same
address space and although the parent does wait for the child to exec (because
of CLONE_VFORK) it never sees the error status the child has written because
the child wrote to a separate version of the memory.

Basically the parent sets arg.err to -1 then does the clone and expects arg.err
to be >= 0 afterwards because the child will have written to it but the fact
that valgrind can't support VM cloning means that doesn't happen.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 384732] posix_spawn with glibc 2.25 causes an assertion

2017-09-15 Thread Daniel
https://bugs.kde.org/show_bug.cgi?id=384732

--- Comment #1 from Daniel  ---
Edit: we're compiling with GCC 7.1.0, no idea if this makes any difference.

-- 
You are receiving this mail because:
You are watching all bug changes.