Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/17/2012 10:56 AM, Pavel Emelyanov wrote: > On 12/17/2012 07:21 PM, H. Peter Anvin wrote: >> Because it is almost impossible to do right? > > In the generic case -- I tend to agree. But it's possible to describe > how a library should communicate to crtools to make it possible. > > Anyway, what I wanted to say -- we didn't have this scenario in our > plans, but criu project is open, and if someone comes with sane idea, > we will not object merging it. > I doubt it is possible using existing compiler toolchains. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/17/2012 07:21 PM, H. Peter Anvin wrote: > Because it is almost impossible to do right? In the generic case -- I tend to agree. But it's possible to describe how a library should communicate to crtools to make it possible. Anyway, what I wanted to say -- we didn't have this scenario in our plans, but criu project is open, and if someone comes with sane idea, we will not object merging it. > Pavel Emelyanov wrote: > >> On 12/14/2012 10:44 PM, Andy Lutomirski wrote: >>> On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin >> wrote: On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: > On 12/14/2012 06:20 AM, Andy Lutomirski wrote: >> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin >> wrote: >>> Wouldn't the vdso get mapped already and could be mremap()'d. If >> we >> really need more control I'd almost push for a device/filesystem >> node >> that could be mmapped the usual way. >> >> Hmm. That may work, but it'll still break ABI. I'm not sure that >> criu is stable enough yet that we should care. Criu people? > > It's not yet, but we'd still appreciate the criu-friendly vdso >> redesign. > >> (In brief summary: how annoying would it be if the vdso was no >> longer >> just a bunch of constant bytes that lived somewhere?) > > It depends on what vdso is going to be. In the perfect case it >> should > a) be mremap-able to any address (or be at fixed address _forever_, >> but >I assume this is not feasible); > b) have entry points at fixed (or somehow movable) places. > > I admit that I didn't understand your question properly, if I did, > please correct me. > mremap() should work. At the same time, the code itself is not >> going to have any stability guarantees between kernel versions -- it >> obviously cannot. >>> >>> We could guarantee that the symbols in the vdso resolve to particular >>> offsets within the vdso. (Yes, this is ugly.) >>> >>> Does criu support checkpointing with one version of a shared library >>> and restoring with another? >> >> No, neither we have this in plans. >> However, if somebody needs this and implements -- why not?! >> >> Thanks, >> Pavel > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
Because it is almost impossible to do right? Pavel Emelyanov wrote: >On 12/14/2012 10:44 PM, Andy Lutomirski wrote: >> On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin >wrote: >>> On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: > On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin >wrote: >> Wouldn't the vdso get mapped already and could be mremap()'d. If >we > really need more control I'd almost push for a device/filesystem >node > that could be mmapped the usual way. > > Hmm. That may work, but it'll still break ABI. I'm not sure that > criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso >redesign. > (In brief summary: how annoying would it be if the vdso was no >longer > just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it >should a) be mremap-able to any address (or be at fixed address _forever_, >but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. >>> >>> mremap() should work. At the same time, the code itself is not >going to >>> have any stability guarantees between kernel versions -- it >obviously >>> cannot. >> >> We could guarantee that the symbols in the vdso resolve to particular >> offsets within the vdso. (Yes, this is ugly.) >> >> Does criu support checkpointing with one version of a shared library >> and restoring with another? > >No, neither we have this in plans. >However, if somebody needs this and implements -- why not?! > >Thanks, >Pavel -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 10:44 PM, Andy Lutomirski wrote: > On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin wrote: >> On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: >>> On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin wrote: > Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? >>> >>> It's not yet, but we'd still appreciate the criu-friendly vdso redesign. >>> (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) >>> >>> It depends on what vdso is going to be. In the perfect case it should >>> a) be mremap-able to any address (or be at fixed address _forever_, but >>>I assume this is not feasible); >>> b) have entry points at fixed (or somehow movable) places. >>> >>> I admit that I didn't understand your question properly, if I did, >>> please correct me. >>> >> >> mremap() should work. At the same time, the code itself is not going to >> have any stability guarantees between kernel versions -- it obviously >> cannot. > > We could guarantee that the symbols in the vdso resolve to particular > offsets within the vdso. (Yes, this is ugly.) > > Does criu support checkpointing with one version of a shared library > and restoring with another? No, neither we have this in plans. However, if somebody needs this and implements -- why not?! Thanks, Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/17/2012 07:21 PM, H. Peter Anvin wrote: Because it is almost impossible to do right? In the generic case -- I tend to agree. But it's possible to describe how a library should communicate to crtools to make it possible. Anyway, what I wanted to say -- we didn't have this scenario in our plans, but criu project is open, and if someone comes with sane idea, we will not object merging it. Pavel Emelyanov xe...@parallels.com wrote: On 12/14/2012 10:44 PM, Andy Lutomirski wrote: On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com wrote: On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? No, neither we have this in plans. However, if somebody needs this and implements -- why not?! Thanks, Pavel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/17/2012 10:56 AM, Pavel Emelyanov wrote: On 12/17/2012 07:21 PM, H. Peter Anvin wrote: Because it is almost impossible to do right? In the generic case -- I tend to agree. But it's possible to describe how a library should communicate to crtools to make it possible. Anyway, what I wanted to say -- we didn't have this scenario in our plans, but criu project is open, and if someone comes with sane idea, we will not object merging it. I doubt it is possible using existing compiler toolchains. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 10:44 PM, Andy Lutomirski wrote: On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com wrote: On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? No, neither we have this in plans. However, if somebody needs this and implements -- why not?! Thanks, Pavel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
Because it is almost impossible to do right? Pavel Emelyanov xe...@parallels.com wrote: On 12/14/2012 10:44 PM, Andy Lutomirski wrote: On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com wrote: On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? No, neither we have this in plans. However, if somebody needs this and implements -- why not?! Thanks, Pavel -- Sent from my mobile phone. Please excuse brevity and lack of formatting. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 03:48 PM, John Stultz wrote: > On 12/14/2012 02:48 PM, H. Peter Anvin wrote: >> On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote: >>> On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: >>> >>> >>> This won't help in case of scenario you've been pointing in >>> previous email (where c/r happens in a middle of vdso), >>> would it? Because we still need somehow to be sure we're not >>> checkpointing in a middle of signal handler which will return >>> to some vdso place. >> It is okay if and only if those vdso places never change... which I >> think is doable if they only contain trival system call wrappers, i.e. >> something like: >> >> movl $__SYS_gettimeofday, %eax >> syscall >> ret > > Though doesn't this make it easier for exploits (somewhat undoing ASLR)? > I know Andi always wanted to avoid having syscall instructions at a > fixed location for the old vsyscall code (though I know we had it > none-the-less for awhile). But maybe I'm confusing issues here? > They aren't in fixed addresses across processes... the vdso location can still be randomized. It just has to be the same across the checkpoint/restart operation, just like all the other instructions. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 02:48 PM, H. Peter Anvin wrote: On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote: On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: This won't help in case of scenario you've been pointing in previous email (where c/r happens in a middle of vdso), would it? Because we still need somehow to be sure we're not checkpointing in a middle of signal handler which will return to some vdso place. It is okay if and only if those vdso places never change... which I think is doable if they only contain trival system call wrappers, i.e. something like: movl $__SYS_gettimeofday, %eax syscall ret Though doesn't this make it easier for exploits (somewhat undoing ASLR)? I know Andi always wanted to avoid having syscall instructions at a fixed location for the old vsyscall code (though I know we had it none-the-less for awhile). But maybe I'm confusing issues here? thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 03:09 PM, Stefani Seibold wrote: > > Sorry for not following the discussion, but im am currently trying to > compile the vclocktime.c as a 32 bit object. Most of the (clever) work > is done. > > After this the next step is to map the needed fixmaps into the 32 bit > address space. Maybe this can be done with install_special_mapping(). > install_special_mapping() is indeed how it is done. The suggestion is to make the vvar page an actual section inside the vdso, and then just substitute the vvar page into the mapping array when installing the vdso into the process user space. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
Am Freitag, den 14.12.2012, 14:46 -0800 schrieb H. Peter Anvin: > On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: > > On 12/14/2012 06:20 AM, Andy Lutomirski wrote: > >> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin wrote: > >>> Wouldn't the vdso get mapped already and could be mremap()'d. If we > >> really need more control I'd almost push for a device/filesystem node > >> that could be mmapped the usual way. > >> > >> Hmm. That may work, but it'll still break ABI. I'm not sure that > >> criu is stable enough yet that we should care. Criu people? > > > > It's not yet, but we'd still appreciate the criu-friendly vdso redesign. > > > >> (In brief summary: how annoying would it be if the vdso was no longer > >> just a bunch of constant bytes that lived somewhere?) > > > > It depends on what vdso is going to be. In the perfect case it should > > a) be mremap-able to any address (or be at fixed address _forever_, but > >I assume this is not feasible); > > b) have entry points at fixed (or somehow movable) places. > > > > I admit that I didn't understand your question properly, if I did, > > please correct me. > > > > Either way... criu on the side, we should proceed with this vdso > redesign and get support for the 32-bit entry points including compat > mode on x86-64. > > -hpa > > Sorry for not following the discussion, but im am currently trying to compile the vclocktime.c as a 32 bit object. Most of the (clever) work is done. After this the next step is to map the needed fixmaps into the 32 bit address space. Maybe this can be done with install_special_mapping(). I think i will do this job in the next days. - Stefani -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote: > On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: >> On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote: >>> >>> this would allow us to defer checkpoint until task finish vdso code. Peter, >>> if I understand you correctly you propose we provide some own proxy-vdso >>> which would redirect calls to real ones, right? But the main problem >>> is that is exactly the idea to be able to c/r existing programs without >>> recompiling and such (or I miss something here?). >> >> No, I'm proposing that you use a proxy-vdso which does nothing but >> system calls, and therefore can be stable indefinitely. > > This won't help in case of scenario you've been pointing in > previous email (where c/r happens in a middle of vdso), > would it? Because we still need somehow to be sure we're not > checkpointing in a middle of signal handler which will return > to some vdso place. It is okay if and only if those vdso places never change... which I think is doable if they only contain trival system call wrappers, i.e. something like: movl $__SYS_gettimeofday, %eax syscall ret These kinds of wrappers don't rely on live data provided by the kernel, and so can be checkpointed together with the rest of the process. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: > On 12/14/2012 06:20 AM, Andy Lutomirski wrote: >> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin wrote: >>> Wouldn't the vdso get mapped already and could be mremap()'d. If we >> really need more control I'd almost push for a device/filesystem node >> that could be mmapped the usual way. >> >> Hmm. That may work, but it'll still break ABI. I'm not sure that >> criu is stable enough yet that we should care. Criu people? > > It's not yet, but we'd still appreciate the criu-friendly vdso redesign. > >> (In brief summary: how annoying would it be if the vdso was no longer >> just a bunch of constant bytes that lived somewhere?) > > It depends on what vdso is going to be. In the perfect case it should > a) be mremap-able to any address (or be at fixed address _forever_, but >I assume this is not feasible); > b) have entry points at fixed (or somehow movable) places. > > I admit that I didn't understand your question properly, if I did, > please correct me. > Either way... criu on the side, we should proceed with this vdso redesign and get support for the 32-bit entry points including compat mode on x86-64. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: > On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote: > > > > this would allow us to defer checkpoint until task finish vdso code. Peter, > > if I understand you correctly you propose we provide some own proxy-vdso > > which would redirect calls to real ones, right? But the main problem > > is that is exactly the idea to be able to c/r existing programs without > > recompiling and such (or I miss something here?). > > No, I'm proposing that you use a proxy-vdso which does nothing but > system calls, and therefore can be stable indefinitely. This won't help in case of scenario you've been pointing in previous email (where c/r happens in a middle of vdso), would it? Because we still need somehow to be sure we're not checkpointing in a middle of signal handler which will return to some vdso place. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote: > > this would allow us to defer checkpoint until task finish vdso code. Peter, > if I understand you correctly you propose we provide some own proxy-vdso > which would redirect calls to real ones, right? But the main problem > is that is exactly the idea to be able to c/r existing programs without > recompiling and such (or I miss something here?). > No, I'm proposing that you use a proxy-vdso which does nothing but system calls, and therefore can be stable indefinitely. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 02:00:17PM -0800, H. Peter Anvin wrote: > On 12/14/2012 01:27 PM, Andy Lutomirski wrote: > > > > I don't know all that much about the linux vm. Can we create a > > special vdso address_space or struct inode or something so that a > > single vma can contain pages with different flags? > > > > No, that is still different vmas, but it probably isn't a big deal. > > The advantage of having an inode/namespace is that it lets you use > mmap() as opposed to mremap() with it, which might be useful, I don't know. > > One option for the checkpoint people might actually be to not use the > vdso for a process that needs to be checkpointed and restarted on a > different machine or different kernel version. Instead they can install > a pseudo-vdso which just calls normal system calls, and is simply a > static piece of code that makes normal system calls ... since the > internals of the kernel are hidden from userspace it is "clean" that way. > > With any actual vdso you risk something like: > Is there a chance to make it something like that (assuming the dumpee is ptraced) > -> vdso entry mark task as vdso-entered > -> signal received, transfer to signal handler > -> signal handler exit before task leave vdso the task mark vdso-entered get cleaned and if ptraced, the ptracing task is notified > ... and now you return to the address in the old vdso, but the internals > of the vdso may have changed. this would allow us to defer checkpoint until task finish vdso code. Peter, if I understand you correctly you propose we provide some own proxy-vdso which would redirect calls to real ones, right? But the main problem is that is exactly the idea to be able to c/r existing programs without recompiling and such (or I miss something here?). Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 01:27 PM, Andy Lutomirski wrote: > > I don't know all that much about the linux vm. Can we create a > special vdso address_space or struct inode or something so that a > single vma can contain pages with different flags? > No, that is still different vmas, but it probably isn't a big deal. The advantage of having an inode/namespace is that it lets you use mmap() as opposed to mremap() with it, which might be useful, I don't know. One option for the checkpoint people might actually be to not use the vdso for a process that needs to be checkpointed and restarted on a different machine or different kernel version. Instead they can install a pseudo-vdso which just calls normal system calls, and is simply a static piece of code that makes normal system calls ... since the internals of the kernel are hidden from userspace it is "clean" that way. With any actual vdso you risk something like: -> vdso entry -> signal received, transfer to signal handler -> checkpoint -> restart -> signal handler exit ... and now you return to the address in the old vdso, but the internals of the vdso may have changed. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 1:08 PM, H. Peter Anvin wrote: > On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: >>> The real issue is that happens if the process is checkpointed while >>> inside the vdso and now eip/rip or a stack frame points into the vdso. >>> This is not impossible or even unlikely, especially on 32 bits it is >>> downright likely. >> >> I fear if there are stacked ip which point to vdso -- we simply won't >> be able to restore properly if vdso internal format changed significantly >> between kernel versions. (At moment we restore vdso exactly at same position >> it was on checkpoint stage with same content, iirc). >> > > I don't think there is a way around that. It is completely unreasonable > to say that the vdso cannot change between kernel versions, for obvious > reasons. It's worse than "significantly"... changing even one > instruction makes it plausible your eip/rip will point into the middle > of an instruction. It's not just kernel versions -- different toolchains may generate different code. Heck, building from a different directory can sometimes generate different output. The ABI of each vdso function is stable, though -- a sufficiently clever tool could (maybe) use that knowledge along with unwind data in the vdso to fix everything up. This would be interesting, perhaps, but certainly not easy. I say we declare "if you want a working vdso in a weird location, mremap it". But how does userspace figure out what size to pass to mremap? If it's one vma, it's easy. I don't know all that much about the linux vm. Can we create a special vdso address_space or struct inode or something so that a single vma can contain pages with different flags? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 01:20 PM, Cyrill Gorcunov wrote: > On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote: >> On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: > The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. >>> >>> I fear if there are stacked ip which point to vdso -- we simply won't >>> be able to restore properly if vdso internal format changed significantly >>> between kernel versions. (At moment we restore vdso exactly at same position >>> it was on checkpoint stage with same content, iirc). >>> >> >> I don't think there is a way around that. It is completely unreasonable >> to say that the vdso cannot change between kernel versions, for obvious >> reasons. It's worse than "significantly"... changing even one >> instruction makes it plausible your eip/rip will point into the middle >> of an instruction. > > Well, one idea was to try to escape dumping when a dumpee inside vdso area > and wait until it leaves this zone, then proceed dumping. Then, if vdso is > changed (say some new instructions were added) we zap original prologues > with jmp to new symbols from fresh vdso provided us by a kernel. I'm not > really sure if this would help us much but just saying (I must admit I > didn't looked yet into vdso implementation details, so sorry if it sounds > stupid). > Well, if the vdso contains a system call you may be waiting indefinitely. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote: > On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: > >>> > >> The real issue is that happens if the process is checkpointed while > >> inside the vdso and now eip/rip or a stack frame points into the vdso. > >> This is not impossible or even unlikely, especially on 32 bits it is > >> downright likely. > > > > I fear if there are stacked ip which point to vdso -- we simply won't > > be able to restore properly if vdso internal format changed significantly > > between kernel versions. (At moment we restore vdso exactly at same position > > it was on checkpoint stage with same content, iirc). > > > > I don't think there is a way around that. It is completely unreasonable > to say that the vdso cannot change between kernel versions, for obvious > reasons. It's worse than "significantly"... changing even one > instruction makes it plausible your eip/rip will point into the middle > of an instruction. Well, one idea was to try to escape dumping when a dumpee inside vdso area and wait until it leaves this zone, then proceed dumping. Then, if vdso is changed (say some new instructions were added) we zap original prologues with jmp to new symbols from fresh vdso provided us by a kernel. I'm not really sure if this would help us much but just saying (I must admit I didn't looked yet into vdso implementation details, so sorry if it sounds stupid). Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: >>> >> The real issue is that happens if the process is checkpointed while >> inside the vdso and now eip/rip or a stack frame points into the vdso. >> This is not impossible or even unlikely, especially on 32 bits it is >> downright likely. > > I fear if there are stacked ip which point to vdso -- we simply won't > be able to restore properly if vdso internal format changed significantly > between kernel versions. (At moment we restore vdso exactly at same position > it was on checkpoint stage with same content, iirc). > I don't think there is a way around that. It is completely unreasonable to say that the vdso cannot change between kernel versions, for obvious reasons. It's worse than "significantly"... changing even one instruction makes it plausible your eip/rip will point into the middle of an instruction. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 10:47:53AM -0800, H. Peter Anvin wrote: > On 12/14/2012 10:44 AM, Andy Lutomirski wrote: > >> > >> mremap() should work. At the same time, the code itself is not going to > >> have any stability guarantees between kernel versions -- it obviously > >> cannot. > > > > We could guarantee that the symbols in the vdso resolve to particular > > offsets within the vdso. (Yes, this is ugly.) > > > > Does criu support checkpointing with one version of a shared library > > and restoring with another? If there are no textrels (or whatever the > > relocation type that actually modifies text as opposed to just the plt > > or got) then, in principle, it should be doable. Otherwise some > > kernel help will be needed to checkpoint reliably on one kernel and > > restore somewhere else. > > > > (This isn't a regression -- it's already broken.) > > > The real issue is that happens if the process is checkpointed while > inside the vdso and now eip/rip or a stack frame points into the vdso. > This is not impossible or even unlikely, especially on 32 bits it is > downright likely. I fear if there are stacked ip which point to vdso -- we simply won't be able to restore properly if vdso internal format changed significantly between kernel versions. (At moment we restore vdso exactly at same position it was on checkpoint stage with same content, iirc). Cyrill -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 10:44 AM, Andy Lutomirski wrote: >> >> mremap() should work. At the same time, the code itself is not going to >> have any stability guarantees between kernel versions -- it obviously >> cannot. > > We could guarantee that the symbols in the vdso resolve to particular > offsets within the vdso. (Yes, this is ugly.) > > Does criu support checkpointing with one version of a shared library > and restoring with another? If there are no textrels (or whatever the > relocation type that actually modifies text as opposed to just the plt > or got) then, in principle, it should be doable. Otherwise some > kernel help will be needed to checkpoint reliably on one kernel and > restore somewhere else. > > (This isn't a regression -- it's already broken.) > The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin wrote: > On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: >> On 12/14/2012 06:20 AM, Andy Lutomirski wrote: >>> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we >>> really need more control I'd almost push for a device/filesystem node >>> that could be mmapped the usual way. >>> >>> Hmm. That may work, but it'll still break ABI. I'm not sure that >>> criu is stable enough yet that we should care. Criu people? >> >> It's not yet, but we'd still appreciate the criu-friendly vdso redesign. >> >>> (In brief summary: how annoying would it be if the vdso was no longer >>> just a bunch of constant bytes that lived somewhere?) >> >> It depends on what vdso is going to be. In the perfect case it should >> a) be mremap-able to any address (or be at fixed address _forever_, but >>I assume this is not feasible); >> b) have entry points at fixed (or somehow movable) places. >> >> I admit that I didn't understand your question properly, if I did, >> please correct me. >> > > mremap() should work. At the same time, the code itself is not going to > have any stability guarantees between kernel versions -- it obviously > cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? If there are no textrels (or whatever the relocation type that actually modifies text as opposed to just the plt or got) then, in principle, it should be doable. Otherwise some kernel help will be needed to checkpoint reliably on one kernel and restore somewhere else. (This isn't a regression -- it's already broken.) > > Incidentally, the MAYWRITE bit which is there to allow breakpoints is > obviously problematic for the vvar page. We could mark the vvar page > differently, meaning more vmas, or we could decide it just doesn't > matter and that if you mprotect() the vvar page and write to it you get > exactly what you asked for... I have no strong preference here. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: > On 12/14/2012 06:20 AM, Andy Lutomirski wrote: >> On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin wrote: >>> Wouldn't the vdso get mapped already and could be mremap()'d. If we >> really need more control I'd almost push for a device/filesystem node >> that could be mmapped the usual way. >> >> Hmm. That may work, but it'll still break ABI. I'm not sure that >> criu is stable enough yet that we should care. Criu people? > > It's not yet, but we'd still appreciate the criu-friendly vdso redesign. > >> (In brief summary: how annoying would it be if the vdso was no longer >> just a bunch of constant bytes that lived somewhere?) > > It depends on what vdso is going to be. In the perfect case it should > a) be mremap-able to any address (or be at fixed address _forever_, but >I assume this is not feasible); > b) have entry points at fixed (or somehow movable) places. > > I admit that I didn't understand your question properly, if I did, > please correct me. > mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. Incidentally, the MAYWRITE bit which is there to allow breakpoints is obviously problematic for the vvar page. We could mark the vvar page differently, meaning more vmas, or we could decide it just doesn't matter and that if you mprotect() the vvar page and write to it you get exactly what you asked for... -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 06:20 AM, Andy Lutomirski wrote: > On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin wrote: >> Wouldn't the vdso get mapped already and could be mremap()'d. If we > really need more control I'd almost push for a device/filesystem node > that could be mmapped the usual way. > > Hmm. That may work, but it'll still break ABI. I'm not sure that > criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. > (In brief summary: how annoying would it be if the vdso was no longer > just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. Thanks, Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. Thanks, Pavel -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. Incidentally, the MAYWRITE bit which is there to allow breakpoints is obviously problematic for the vvar page. We could mark the vvar page differently, meaning more vmas, or we could decide it just doesn't matter and that if you mprotect() the vvar page and write to it you get exactly what you asked for... -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 10:35 AM, H. Peter Anvin h...@zytor.com wrote: On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? If there are no textrels (or whatever the relocation type that actually modifies text as opposed to just the plt or got) then, in principle, it should be doable. Otherwise some kernel help will be needed to checkpoint reliably on one kernel and restore somewhere else. (This isn't a regression -- it's already broken.) Incidentally, the MAYWRITE bit which is there to allow breakpoints is obviously problematic for the vvar page. We could mark the vvar page differently, meaning more vmas, or we could decide it just doesn't matter and that if you mprotect() the vvar page and write to it you get exactly what you asked for... I have no strong preference here. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 10:44 AM, Andy Lutomirski wrote: mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? If there are no textrels (or whatever the relocation type that actually modifies text as opposed to just the plt or got) then, in principle, it should be doable. Otherwise some kernel help will be needed to checkpoint reliably on one kernel and restore somewhere else. (This isn't a regression -- it's already broken.) The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 10:47:53AM -0800, H. Peter Anvin wrote: On 12/14/2012 10:44 AM, Andy Lutomirski wrote: mremap() should work. At the same time, the code itself is not going to have any stability guarantees between kernel versions -- it obviously cannot. We could guarantee that the symbols in the vdso resolve to particular offsets within the vdso. (Yes, this is ugly.) Does criu support checkpointing with one version of a shared library and restoring with another? If there are no textrels (or whatever the relocation type that actually modifies text as opposed to just the plt or got) then, in principle, it should be doable. Otherwise some kernel help will be needed to checkpoint reliably on one kernel and restore somewhere else. (This isn't a regression -- it's already broken.) The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. I fear if there are stacked ip which point to vdso -- we simply won't be able to restore properly if vdso internal format changed significantly between kernel versions. (At moment we restore vdso exactly at same position it was on checkpoint stage with same content, iirc). Cyrill -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. I fear if there are stacked ip which point to vdso -- we simply won't be able to restore properly if vdso internal format changed significantly between kernel versions. (At moment we restore vdso exactly at same position it was on checkpoint stage with same content, iirc). I don't think there is a way around that. It is completely unreasonable to say that the vdso cannot change between kernel versions, for obvious reasons. It's worse than significantly... changing even one instruction makes it plausible your eip/rip will point into the middle of an instruction. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote: On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. I fear if there are stacked ip which point to vdso -- we simply won't be able to restore properly if vdso internal format changed significantly between kernel versions. (At moment we restore vdso exactly at same position it was on checkpoint stage with same content, iirc). I don't think there is a way around that. It is completely unreasonable to say that the vdso cannot change between kernel versions, for obvious reasons. It's worse than significantly... changing even one instruction makes it plausible your eip/rip will point into the middle of an instruction. Well, one idea was to try to escape dumping when a dumpee inside vdso area and wait until it leaves this zone, then proceed dumping. Then, if vdso is changed (say some new instructions were added) we zap original prologues with jmp to new symbols from fresh vdso provided us by a kernel. I'm not really sure if this would help us much but just saying (I must admit I didn't looked yet into vdso implementation details, so sorry if it sounds stupid). Cyrill -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 01:20 PM, Cyrill Gorcunov wrote: On Fri, Dec 14, 2012 at 01:08:35PM -0800, H. Peter Anvin wrote: On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. I fear if there are stacked ip which point to vdso -- we simply won't be able to restore properly if vdso internal format changed significantly between kernel versions. (At moment we restore vdso exactly at same position it was on checkpoint stage with same content, iirc). I don't think there is a way around that. It is completely unreasonable to say that the vdso cannot change between kernel versions, for obvious reasons. It's worse than significantly... changing even one instruction makes it plausible your eip/rip will point into the middle of an instruction. Well, one idea was to try to escape dumping when a dumpee inside vdso area and wait until it leaves this zone, then proceed dumping. Then, if vdso is changed (say some new instructions were added) we zap original prologues with jmp to new symbols from fresh vdso provided us by a kernel. I'm not really sure if this would help us much but just saying (I must admit I didn't looked yet into vdso implementation details, so sorry if it sounds stupid). Well, if the vdso contains a system call you may be waiting indefinitely. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 1:08 PM, H. Peter Anvin h...@zytor.com wrote: On 12/14/2012 12:12 PM, Cyrill Gorcunov wrote: The real issue is that happens if the process is checkpointed while inside the vdso and now eip/rip or a stack frame points into the vdso. This is not impossible or even unlikely, especially on 32 bits it is downright likely. I fear if there are stacked ip which point to vdso -- we simply won't be able to restore properly if vdso internal format changed significantly between kernel versions. (At moment we restore vdso exactly at same position it was on checkpoint stage with same content, iirc). I don't think there is a way around that. It is completely unreasonable to say that the vdso cannot change between kernel versions, for obvious reasons. It's worse than significantly... changing even one instruction makes it plausible your eip/rip will point into the middle of an instruction. It's not just kernel versions -- different toolchains may generate different code. Heck, building from a different directory can sometimes generate different output. The ABI of each vdso function is stable, though -- a sufficiently clever tool could (maybe) use that knowledge along with unwind data in the vdso to fix everything up. This would be interesting, perhaps, but certainly not easy. I say we declare if you want a working vdso in a weird location, mremap it. But how does userspace figure out what size to pass to mremap? If it's one vma, it's easy. I don't know all that much about the linux vm. Can we create a special vdso address_space or struct inode or something so that a single vma can contain pages with different flags? --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 01:27 PM, Andy Lutomirski wrote: I don't know all that much about the linux vm. Can we create a special vdso address_space or struct inode or something so that a single vma can contain pages with different flags? No, that is still different vmas, but it probably isn't a big deal. The advantage of having an inode/namespace is that it lets you use mmap() as opposed to mremap() with it, which might be useful, I don't know. One option for the checkpoint people might actually be to not use the vdso for a process that needs to be checkpointed and restarted on a different machine or different kernel version. Instead they can install a pseudo-vdso which just calls normal system calls, and is simply a static piece of code that makes normal system calls ... since the internals of the kernel are hidden from userspace it is clean that way. With any actual vdso you risk something like: - vdso entry - signal received, transfer to signal handler - checkpoint - restart - signal handler exit ... and now you return to the address in the old vdso, but the internals of the vdso may have changed. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 02:00:17PM -0800, H. Peter Anvin wrote: On 12/14/2012 01:27 PM, Andy Lutomirski wrote: I don't know all that much about the linux vm. Can we create a special vdso address_space or struct inode or something so that a single vma can contain pages with different flags? No, that is still different vmas, but it probably isn't a big deal. The advantage of having an inode/namespace is that it lets you use mmap() as opposed to mremap() with it, which might be useful, I don't know. One option for the checkpoint people might actually be to not use the vdso for a process that needs to be checkpointed and restarted on a different machine or different kernel version. Instead they can install a pseudo-vdso which just calls normal system calls, and is simply a static piece of code that makes normal system calls ... since the internals of the kernel are hidden from userspace it is clean that way. With any actual vdso you risk something like: Is there a chance to make it something like that (assuming the dumpee is ptraced) - vdso entry mark task as vdso-entered - signal received, transfer to signal handler - signal handler exit before task leave vdso the task mark vdso-entered get cleaned and if ptraced, the ptracing task is notified ... and now you return to the address in the old vdso, but the internals of the vdso may have changed. this would allow us to defer checkpoint until task finish vdso code. Peter, if I understand you correctly you propose we provide some own proxy-vdso which would redirect calls to real ones, right? But the main problem is that is exactly the idea to be able to c/r existing programs without recompiling and such (or I miss something here?). Cyrill -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote: this would allow us to defer checkpoint until task finish vdso code. Peter, if I understand you correctly you propose we provide some own proxy-vdso which would redirect calls to real ones, right? But the main problem is that is exactly the idea to be able to c/r existing programs without recompiling and such (or I miss something here?). No, I'm proposing that you use a proxy-vdso which does nothing but system calls, and therefore can be stable indefinitely. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote: this would allow us to defer checkpoint until task finish vdso code. Peter, if I understand you correctly you propose we provide some own proxy-vdso which would redirect calls to real ones, right? But the main problem is that is exactly the idea to be able to c/r existing programs without recompiling and such (or I miss something here?). No, I'm proposing that you use a proxy-vdso which does nothing but system calls, and therefore can be stable indefinitely. This won't help in case of scenario you've been pointing in previous email (where c/r happens in a middle of vdso), would it? Because we still need somehow to be sure we're not checkpointing in a middle of signal handler which will return to some vdso place. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. Either way... criu on the side, we should proceed with this vdso redesign and get support for the 32-bit entry points including compat mode on x86-64. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote: On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: On 12/14/2012 02:25 PM, Cyrill Gorcunov wrote: this would allow us to defer checkpoint until task finish vdso code. Peter, if I understand you correctly you propose we provide some own proxy-vdso which would redirect calls to real ones, right? But the main problem is that is exactly the idea to be able to c/r existing programs without recompiling and such (or I miss something here?). No, I'm proposing that you use a proxy-vdso which does nothing but system calls, and therefore can be stable indefinitely. This won't help in case of scenario you've been pointing in previous email (where c/r happens in a middle of vdso), would it? Because we still need somehow to be sure we're not checkpointing in a middle of signal handler which will return to some vdso place. It is okay if and only if those vdso places never change... which I think is doable if they only contain trival system call wrappers, i.e. something like: movl $__SYS_gettimeofday, %eax syscall ret These kinds of wrappers don't rely on live data provided by the kernel, and so can be checkpointed together with the rest of the process. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
Am Freitag, den 14.12.2012, 14:46 -0800 schrieb H. Peter Anvin: On 12/14/2012 12:34 AM, Pavel Emelyanov wrote: On 12/14/2012 06:20 AM, Andy Lutomirski wrote: On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin h...@zytor.com wrote: Wouldn't the vdso get mapped already and could be mremap()'d. If we really need more control I'd almost push for a device/filesystem node that could be mmapped the usual way. Hmm. That may work, but it'll still break ABI. I'm not sure that criu is stable enough yet that we should care. Criu people? It's not yet, but we'd still appreciate the criu-friendly vdso redesign. (In brief summary: how annoying would it be if the vdso was no longer just a bunch of constant bytes that lived somewhere?) It depends on what vdso is going to be. In the perfect case it should a) be mremap-able to any address (or be at fixed address _forever_, but I assume this is not feasible); b) have entry points at fixed (or somehow movable) places. I admit that I didn't understand your question properly, if I did, please correct me. Either way... criu on the side, we should proceed with this vdso redesign and get support for the 32-bit entry points including compat mode on x86-64. -hpa Sorry for not following the discussion, but im am currently trying to compile the vclocktime.c as a 32 bit object. Most of the (clever) work is done. After this the next step is to map the needed fixmaps into the 32 bit address space. Maybe this can be done with install_special_mapping(). I think i will do this job in the next days. - Stefani -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 03:09 PM, Stefani Seibold wrote: Sorry for not following the discussion, but im am currently trying to compile the vclocktime.c as a 32 bit object. Most of the (clever) work is done. After this the next step is to map the needed fixmaps into the 32 bit address space. Maybe this can be done with install_special_mapping(). install_special_mapping() is indeed how it is done. The suggestion is to make the vvar page an actual section inside the vdso, and then just substitute the vvar page into the mapping array when installing the vdso into the process user space. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 02:48 PM, H. Peter Anvin wrote: On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote: On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: This won't help in case of scenario you've been pointing in previous email (where c/r happens in a middle of vdso), would it? Because we still need somehow to be sure we're not checkpointing in a middle of signal handler which will return to some vdso place. It is okay if and only if those vdso places never change... which I think is doable if they only contain trival system call wrappers, i.e. something like: movl $__SYS_gettimeofday, %eax syscall ret Though doesn't this make it easier for exploits (somewhat undoing ASLR)? I know Andi always wanted to avoid having syscall instructions at a fixed location for the old vsyscall code (though I know we had it none-the-less for awhile). But maybe I'm confusing issues here? thanks -john -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRIU] [PATCH] Add VDSO time function support for x86 32-bit kernel
On 12/14/2012 03:48 PM, John Stultz wrote: On 12/14/2012 02:48 PM, H. Peter Anvin wrote: On 12/14/2012 02:43 PM, Cyrill Gorcunov wrote: On Fri, Dec 14, 2012 at 02:27:08PM -0800, H. Peter Anvin wrote: This won't help in case of scenario you've been pointing in previous email (where c/r happens in a middle of vdso), would it? Because we still need somehow to be sure we're not checkpointing in a middle of signal handler which will return to some vdso place. It is okay if and only if those vdso places never change... which I think is doable if they only contain trival system call wrappers, i.e. something like: movl $__SYS_gettimeofday, %eax syscall ret Though doesn't this make it easier for exploits (somewhat undoing ASLR)? I know Andi always wanted to avoid having syscall instructions at a fixed location for the old vsyscall code (though I know we had it none-the-less for awhile). But maybe I'm confusing issues here? They aren't in fixed addresses across processes... the vdso location can still be randomized. It just has to be the same across the checkpoint/restart operation, just like all the other instructions. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/