Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-05-01 Thread Larry Cable
On Wed, 1 May 2024 20:07:00 GMT, Larry Cable  wrote:

>> c.f: /proc//ns/pid 
>> 
>> every (Linux) namespace has a unique id, if 2 (or more) processes occupy the 
>> same pid namespace (or any other for that matter) then their ../ns/pid 
>> namespace ids will be the same.
>
> **`Files.readSymbolicLink(Path.of("/proc/self/ns/pid"))`**

h'mmm ignore my ramblings for now, I need to spend some more time looking into 
this before wading into the fray with random opinions etc!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17628#discussion_r1586791713


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-05-01 Thread Larry Cable
On Wed, 1 May 2024 18:35:29 GMT, Larry Cable  wrote:

>> Do you mean that it should compare the input PID against the outermost 
>> (leftmost) PID in the `NSpid` list from `/proc//status` and not 
>> innermost (rightmost) as is done right now? What would be the benefit of 
>> that? Or did you mean something else?
>> 
>> I'm working on a fix for https://bugs.openjdk.org/browse/JDK-8327114 right 
>> now, and it occurred to me that there is a tiny risk of `pid != ns_pid` not 
>> evaluating to `true` even though the processes are in different PID 
>> namespaces (because two different PID namespaces can have the same PIDs). I 
>> think it could be mitigated by always trying `/proc//root/tmp` first, 
>> and if it cannot be read, fall back to `/tmp`.
>
> c.f: /proc//ns/pid 
> 
> every (Linux) namespace has a unique id, if 2 (or more) processes occupy the 
> same pid namespace (or any other for that matter) then their ../ns/pid 
> namespace ids will be the same.

**`Files.readSymbolicLink(Path.of("/proc/self/ns/pid"))`**

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17628#discussion_r1586769059


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-05-01 Thread Larry Cable
On Wed, 1 May 2024 17:47:47 GMT, Sebastian Lövdahl  wrote:

>> should it not be comparing pid namespace ids and not pids?
>
> Do you mean that it should compare the input PID against the outermost 
> (leftmost) PID in the `NSpid` list from `/proc//status` and not 
> innermost (rightmost) as is done right now? What would be the benefit of 
> that? Or did you mean something else?
> 
> I'm working on a fix for https://bugs.openjdk.org/browse/JDK-8327114 right 
> now, and it occurred to me that there is a tiny risk of `pid != ns_pid` not 
> evaluating to `true` even though the processes are in different PID 
> namespaces (because two different PID namespaces can have the same PIDs). I 
> think it could be mitigated by always trying `/proc//root/tmp` first, 
> and if it cannot be read, fall back to `/tmp`.

c.f: /proc//ns/pid 

every (Linux) namespace has a unique id, if 2 (or more) processes occupy the 
same pid namespace (or any other for that matter) then their ../ns/pid 
namespace ids will be the same.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17628#discussion_r1586623838


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-05-01 Thread Sebastian Lövdahl
On Wed, 1 May 2024 17:30:05 GMT, Larry Cable  wrote:

>> src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java line 
>> 217:
>> 
>>> 215: // Instead, attach relative to the target root filesystem as 
>>> exposed by
>>> 216: // procfs regardless of namespaces.
>>> 217: String root = "/proc/" + pid + "/root/" + tmpdir;
>> 
>> Helping myself and other future readers understand this: the problem with 
>> the previous implementation is that the code _assumed_ that the tmpdir could 
>> be accessed this way (`/proc//root/`). In other words:
>> 
>> * The code for creating the socket would correctly check if `pid != ns_pid` 
>> and then act accordingly (`/proc//root/` or just plain 
>> ``)
>> * The code for reading the socket would not have the check the above. It 
>> would resort to always use `/proc//root/`.
>> * For certain scenarios (`CAP_NET_BIND_SERVICE`-processes, as described in 
>> https://github.com/openjdk/jdk/pull/17628#issuecomment-1916589081), we would 
>> get a `Permission denied` when trying to access the temporary directory like 
>> this.
>> 
>> What this PR does is to ensure that the same `pid != ns_pid` check is used 
>> both when creating and reading the socket, and fall back to `` when 
>> no namespacing is being used. This seems to work better for these processes 
>> with elevated permissions.
>
> should it not be comparing pid namespace ids and not pids?

Do you mean that it should compare the input PID against the outermost 
(leftmost) PID in the `NSpid` list from `/proc//status` and not innermost 
(rightmost) as is done right now? What would be the benefit of that? Or did you 
mean something else?

I'm working on a fix for https://bugs.openjdk.org/browse/JDK-8327114 right now, 
and it occurred to me that there is a tiny risk of `pid != ns_pid` not 
evaluating to `true` even though the processes are in different PID namespaces 
(because two different PID namespaces can have the same PIDs). I think it could 
be mitigated by always trying `/proc//root/tmp` first, and if it cannot be 
read, fall back to `/tmp`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17628#discussion_r1586563442


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-05-01 Thread Larry Cable
On Tue, 30 Jan 2024 12:13:05 GMT, Per Lundberg  wrote:

>> 8226919: attach in linux hangs due to permission denied accessing 
>> /proc/pid/root
>
> src/jdk.attach/linux/classes/sun/tools/attach/VirtualMachineImpl.java line 
> 217:
> 
>> 215: // Instead, attach relative to the target root filesystem as 
>> exposed by
>> 216: // procfs regardless of namespaces.
>> 217: String root = "/proc/" + pid + "/root/" + tmpdir;
> 
> Helping myself and other future readers understand this: the problem with the 
> previous implementation is that the code _assumed_ that the tmpdir could be 
> accessed this way (`/proc//root/`). In other words:
> 
> * The code for creating the socket would correctly check if `pid != ns_pid` 
> and then act accordingly (`/proc//root/` or just plain 
> ``)
> * The code for reading the socket would not have the check the above. It 
> would resort to always use `/proc//root/`.
> * For certain scenarios (`CAP_NET_BIND_SERVICE`-processes, as described in 
> https://github.com/openjdk/jdk/pull/17628#issuecomment-1916589081), we would 
> get a `Permission denied` when trying to access the temporary directory like 
> this.
> 
> What this PR does is to ensure that the same `pid != ns_pid` check is used 
> both when creating and reading the socket, and fall back to `` when 
> no namespacing is being used. This seems to work better for these processes 
> with elevated permissions.

should it not be comparing pid namespace ids and not pids?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17628#discussion_r1586542476


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-04-28 Thread Sebastian Lövdahl
On Fri, 1 Mar 2024 15:22:51 GMT, jdoylei  wrote:

>> Logged https://bugs.openjdk.org/browse/JDK-8327114 for investigation.  
>> Thanks @jdoylei  !
>
> @kevinjwalls - Perfect, thank you for opening the JBS bug!

Thanks for the detailed write-up, @jdoylei! I'm sorry to have introduced a 
regression here. Good that the backporting was held off a bit at least :) Let's 
continue the discussion at 
https://mail.openjdk.org/pipermail/serviceability-dev/2024-April/055317.html.

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-2081618227


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-03-01 Thread jdoylei
On Fri, 1 Mar 2024 09:31:47 GMT, Kevin Walls  wrote:

>> @slovdahl - Apologies for adding a comment to a closed Pull Request, but I 
>> happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier 
>> https://bugs.openjdk.org/browse/JDK-8179498 after researching 
>> "AttachNotSupportedException: Unable to open socket file" and 
>> troubleshooting our own OpenJDK 17 jcmd setup on top of containers and 
>> Kubernetes.  Reading the code changes and discussion here, I'm concerned 
>> that this change, which I understand is not yet in OpenJDK 17, might cause a 
>> regression with our setup.
>> 
>> We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two 
>> separate containers in a Kubernetes pod.  The target JVM container is 
>> already running, and then we use `kubectl debug --target=...` to start a 
>> Kubernetes debug container with jcmd that targets the first container.  
>> Given the `--target` option, they share the same Linux process namespace 
>> (both think the target JVM is PID 1).  But since they are separate 
>> containers, they see different root filesystems (jcmd container sees the 
>> target JVM tmpdir under /proc/1/root/tmp but has its own distinct /tmp 
>> directory).  
>> 
>> I believe the attach file and socket file paths then work like this in 
>> OpenJDK 17:
>> * jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd
>> * Target JVM finds the .attach_pid1 attach file in its cwd.
>> * Target JVM creates the .java_pid1 socket file in its tmpdir /tmp
>> * jcmd finds the .java_pid1 socket file in /proc/1/root/tmp
>> 
>> I think this scenario with a Kubernetes debug container may be a little 
>> different from other Docker container scenarios because these are two 
>> different containers with _different root filesystems_ but _the same Linux 
>> process namespace_.  So jcmd using `/proc//root` is necessary to find 
>> the socket file, even though jcmd and the target JVM both agree the PID is 
>> the same (1).  A similar scenario with just Docker Engine is described at 
>> [docker container run - Example, join another container's PID 
>> namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace).
>> 
>> If I understand the code change for this PR, I think it will change the 
>> behavior in this scenario, because `findSocketFile` will have `pid == 
>> ns_pid`, and now will use /tmp instead of `/proc//root/tmp`, based on 
>> `findTargetProcessTmpDirectory`.
>> 
>> We are lucky currently that the only place the current OpenJDK 17 code 
>> checks `pid == ns_pid` is the `createAttachFile` catch block that ru...
>
> Logged https://bugs.openjdk.org/browse/JDK-8327114 for investigation.  Thanks 
> @jdoylei  !

@kevinjwalls - Perfect, thank you for opening the JBS bug!

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1973380664


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-03-01 Thread Kevin Walls
On Wed, 28 Feb 2024 20:02:39 GMT, jdoylei  wrote:

>> Thank you @kevinjwalls and @jerboaa for reviewing and guiding me through 
>> this process, this was a great as a first-time JDK contributor!
>> 
>> One more question, can I do anything to help getting this backported to e.g. 
>> 21 and 17?
>
> @slovdahl - Apologies for adding a comment to a closed Pull Request, but I 
> happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier 
> https://bugs.openjdk.org/browse/JDK-8179498 after researching 
> "AttachNotSupportedException: Unable to open socket file" and troubleshooting 
> our own OpenJDK 17 jcmd setup on top of containers and Kubernetes.  Reading 
> the code changes and discussion here, I'm concerned that this change, which I 
> understand is not yet in OpenJDK 17, might cause a regression with our setup.
> 
> We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two 
> separate containers in a Kubernetes pod.  The target JVM container is already 
> running, and then we use `kubectl debug --target=...` to start a Kubernetes 
> debug container with jcmd that targets the first container.  Given the 
> `--target` option, they share the same Linux process namespace (both think 
> the target JVM is PID 1).  But since they are separate containers, they see 
> different root filesystems (jcmd container sees the target JVM tmpdir under 
> /proc/1/root/tmp but has its own distinct /tmp directory).  
> 
> I believe the attach file and socket file paths then work like this in 
> OpenJDK 17:
> * jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd
> * Target JVM finds the .attach_pid1 attach file in its cwd.
> * Target JVM creates the .java_pid1 socket file in its tmpdir /tmp
> * jcmd finds the .java_pid1 socket file in /proc/1/root/tmp
> 
> I think this scenario with a Kubernetes debug container may be a little 
> different from other Docker container scenarios because these are two 
> different containers with _different root filesystems_ but _the same Linux 
> process namespace_.  So jcmd using `/proc//root` is necessary to find 
> the socket file, even though jcmd and the target JVM both agree the PID is 
> the same (1).  A similar scenario with just Docker Engine is described at 
> [docker container run - Example, join another container's PID 
> namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace).
> 
> If I understand the code change for this PR, I think it will change the 
> behavior in this scenario, because `findSocketFile` will have `pid == 
> ns_pid`, and now will use /tmp instead of `/proc//root/tmp`, based on 
> `findTargetProcessTmpDirectory`.
> 
> We are lucky currently that the only place the current OpenJDK 17 code checks 
> `pid == ns_pid` is the `createAttachFile` catch block that runs if 
> `/proc//cwd/.attach...

Logged https://bugs.openjdk.org/browse/JDK-8327114 for investigation.  Thanks 
@jdoylei  !

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1972833655


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-02-29 Thread Kevin Walls
On Wed, 28 Feb 2024 20:02:39 GMT, jdoylei  wrote:

> If I understand the code change for this PR, I think it will change the 
> behavior in this scenario, because `findSocketFile` will have `pid == 
> ns_pid`, and now will use /tmp instead of `/proc//root/tmp`, based on 
> `findTargetProcessTmpDirectory`.
> 
> We are lucky currently that the only place the current OpenJDK 17 code checks 
> `pid == ns_pid` is the `createAttachFile` catch block that runs if 
> `/proc//cwd/.attach_pid` can't be created, since as long as 
> `/proc//cwd` works, we are fine. But the `pid != ns_pid` check there 
> makes an assumption that if the process namespace is the same, the root 
> filesystem is the same and /tmp can be used, so this catch block wouldn't 
> work if we were to hit it in our scenario. I think propagating this catch 
> block logic into `findSocketFile` will break our scenario - it will force 
> using `/tmp/.java_pid` and that won't work.
> 
> Could the `findSocketFile` logic be made more robust to the different 
> namespace/filesystem scenarios? E.g. attempt `/proc//root` first? Or 
> perhaps there is a way (not `pid != ns_pid`) to more accurately determine 
> whether / and `/proc//root` are the same filesystem and /tmp is OK?

That is certainly worth capturing in a new JBS bug for investigating a further 
change.  If you can't log one, I'll use the information here to do that, thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1971364928


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-02-28 Thread jdoylei
On Fri, 9 Feb 2024 18:40:04 GMT, Sebastian Lövdahl  wrote:

>>> I'll still fix this. So, I should change the PR title to match JDK-8226919, 
>>> and issue an `/issue remove` command for JDK-8307977, is that correct?
>> 
>> Yes exactly, thanks.
>
> Thank you @kevinjwalls and @jerboaa for reviewing and guiding me through this 
> process, this was a great as a first-time JDK contributor!
> 
> One more question, can I do anything to help getting this backported to e.g. 
> 21 and 17?

@slovdahl - Apologies for adding a comment to a closed Pull Request, but I 
happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier 
https://bugs.openjdk.org/browse/JDK-8179498 after researching 
"AttachNotSupportedException: Unable to open socket file" and troubleshooting 
our own OpenJDK 17 jcmd setup on top of containers and Kubernetes.  Reading the 
code changes and discussion here, I'm concerned that this change, which I 
understand is not yet in OpenJDK 17, might cause a regression with our setup.

We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two 
separate containers in a Kubernetes pod.  The target JVM container is already 
running, and then we use `kubectl debug --target=...` to start a Kubernetes 
debug container with jcmd that targets the first container.  Given the 
`--target` option, they share the same Linux process namespace (both think the 
target JVM is PID 1).  But since they are separate containers, they see 
different root filesystems (jcmd container sees the target JVM tmpdir under 
/proc/1/root/tmp but has its own distinct /tmp directory).  

I believe the attach file and socket file paths then work like this in OpenJDK 
17:
* jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd
* Target JVM finds the .attach_pid1 attach file in its cwd.
* Target JVM creates the .java_pid1 socket file in its tmpdir /tmp
* jcmd finds the .java_pid1 socket file in /proc/1/root/tmp

I think this scenario with a Kubernetes debug container may be a little 
different from other Docker container scenarios because these are two different 
containers with _different root filesystems_ but _the same Linux process 
namespace_.  So jcmd using `/proc//root` is necessary to find the socket 
file, even though jcmd and the target JVM both agree the PID is the same (1).  
A similar scenario with just Docker Engine is described at [docker container 
run - Example, join another container's PID 
namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace).

If I understand the code change for this PR, I think it will change the 
behavior in this scenario, because `findSocketFile` will have `pid == ns_pid`, 
and now will use /tmp instead of `/proc//root/tmp`, based on 
`findTargetProcessTmpDirectory`.

We are lucky currently that the only place the current OpenJDK 17 code checks 
`pid == ns_pid` is the `createAttachFile` catch block that runs if 
`/proc//cwd/.attach_pid` can't be created, since as long as 
`/proc//cwd` works, we are fine.  But the `pid != ns_pid` check there 
makes an assumption that if the process namespace is the same, the root 
filesystem is the same and /tmp can be used, so this catch block wouldn't work 
if we were to hit it in our scenario.  I think propagating this catch block 
logic into `findSocketFile` will break our scenario - it will force using 
`/tmp/.java_pid` and that won't work.

Could the `findSocketFile` logic be made more robust to the different 
namespace/filesystem scenarios?  E.g. attempt `/proc//root` first?  Or 
perhaps there is a way (not `pid != ns_pid`) to more accurately determine 
whether / and `/proc//root` are the same filesystem and /tmp is OK?

Thanks for your time!

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1969769654


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-02-12 Thread Severin Gehwolf
On Fri, 9 Feb 2024 18:40:04 GMT, Sebastian Lövdahl  wrote:

> One more question, can I do anything to help getting this backported to e.g. 
> 21 and 17?

First, I suggest to wait a few weeks in order to see if there are any follow-up 
bugs which show up in testing in mainline. Then start backporting it to 22u, 
then 21u, then 17u (in that order). A few references:

https://openjdk.org/guide/#backporting
https://wiki.openjdk.org/display/JDKUpdates/JDK+21u

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1938303184


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-02-09 Thread Sebastian Lövdahl
On Fri, 9 Feb 2024 18:22:47 GMT, Kevin Walls  wrote:

>> Alright, sounds good to me. :) Thanks again for taking a look!
>> 
>>> One other thing - JDK-8226919 looks like the original bug for this, logged 
>>> a few years back, so if this fixes both, the record should show that it 
>>> fixes that one, and JDK-8307977 should be closed as a duplicate. I/somebody 
>>> can take care of that JBS admin. But if this PR could be associated with 
>>> only JDK-8226919 that would be simple.
>> 
>> I'll still fix this. So, I should change the PR title to match JDK-8226919, 
>> and issue an `/issue remove` command for JDK-8307977, is that correct?
>> 
>> Once that is done, I would kindly ask for someone sponsoring this change as 
>> well.
>
>> I'll still fix this. So, I should change the PR title to match JDK-8226919, 
>> and issue an `/issue remove` command for JDK-8307977, is that correct?
> 
> Yes exactly, thanks.

Thank you @kevinjwalls and @jerboaa for reviewing and guiding me through this 
process, this was a great as a first-time JDK contributor!

One more question, can I do anything to help getting this backported to e.g. 21 
and 17?

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1936426583