Re: RFR: 8327114: Attach in Linux may have wrong behaviour when pid == ns_pid (Kubernetes debug container)

2024-05-03 Thread jdoylei
On Thu, 2 May 2024 10:13:51 GMT, Sebastian Lövdahl  wrote:

> 8327114: Attach in Linux may have wrong behaviour when pid == ns_pid 
> (Kubernetes debug container)

> I think it boils down to the same reason as why the fix for JDK-8226919 was 
> needed in the first place - a non-root user cannot read the symlinks in 
> `/proc//ns` for a process running with more privileges even though it's 
> run by the same non-root user.

@slovdahl - In that test case (target JVM process has more privileges), where 
is the attach file created?  Does jcmd end up writing it to `/tmp`?  Or does 
`/proc//cwd` work?  Just curious whether the elevated-privileges scenario 
affects the attach file and socket file locations equally.

-

PR Comment: https://git.openjdk.org/jdk/pull/19055#issuecomment-2093481534


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-03-01 Thread jdoylei
On Fri, 1 Mar 2024 09:31:47 GMT, Kevin Walls  wrote:

>> @slovdahl - Apologies for adding a comment to a closed Pull Request, but I 
>> happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier 
>> https://bugs.openjdk.org/browse/JDK-8179498 after researching 
>> "AttachNotSupportedException: Unable to open socket file" and 
>> troubleshooting our own OpenJDK 17 jcmd setup on top of containers and 
>> Kubernetes.  Reading the code changes and discussion here, I'm concerned 
>> that this change, which I understand is not yet in OpenJDK 17, might cause a 
>> regression with our setup.
>> 
>> We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two 
>> separate containers in a Kubernetes pod.  The target JVM container is 
>> already running, and then we use `kubectl debug --target=...` to start a 
>> Kubernetes debug container with jcmd that targets the first container.  
>> Given the `--target` option, they share the same Linux process namespace 
>> (both think the target JVM is PID 1).  But since they are separate 
>> containers, they see different root filesystems (jcmd container sees the 
>> target JVM tmpdir under /proc/1/root/tmp but has its own distinct /tmp 
>> directory).  
>> 
>> I believe the attach file and socket file paths then work like this in 
>> OpenJDK 17:
>> * jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd
>> * Target JVM finds the .attach_pid1 attach file in its cwd.
>> * Target JVM creates the .java_pid1 socket file in its tmpdir /tmp
>> * jcmd finds the .java_pid1 socket file in /proc/1/root/tmp
>> 
>> I think this scenario with a Kubernetes debug container may be a little 
>> different from other Docker container scenarios because these are two 
>> different containers with _different root filesystems_ but _the same Linux 
>> process namespace_.  So jcmd using `/proc//root` is necessary to find 
>> the socket file, even though jcmd and the target JVM both agree the PID is 
>> the same (1).  A similar scenario with just Docker Engine is described at 
>> [docker container run - Example, join another container's PID 
>> namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace).
>> 
>> If I understand the code change for this PR, I think it will change the 
>> behavior in this scenario, because `findSocketFile` will have `pid == 
>> ns_pid`, and now will use /tmp instead of `/proc//root/tmp`, based on 
>> `findTargetProcessTmpDirectory`.
>> 
>> We are lucky currently that the only place the current OpenJDK 17 code 
>> checks `pid == ns_pid` is the `createAttachFile` catch block that ru...
>
> Logged https://bugs.openjdk.org/browse/JDK-8327114 for investigation.  Thanks 
> @jdoylei  !

@kevinjwalls - Perfect, thank you for opening the JBS bug!

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1973380664


Re: RFR: 8226919: attach in linux hangs due to permission denied accessing /proc/pid/root

2024-02-28 Thread jdoylei
On Fri, 9 Feb 2024 18:40:04 GMT, Sebastian Lövdahl  wrote:

>>> I'll still fix this. So, I should change the PR title to match JDK-8226919, 
>>> and issue an `/issue remove` command for JDK-8307977, is that correct?
>> 
>> Yes exactly, thanks.
>
> Thank you @kevinjwalls and @jerboaa for reviewing and guiding me through this 
> process, this was a great as a first-time JDK contributor!
> 
> One more question, can I do anything to help getting this backported to e.g. 
> 21 and 17?

@slovdahl - Apologies for adding a comment to a closed Pull Request, but I 
happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier 
https://bugs.openjdk.org/browse/JDK-8179498 after researching 
"AttachNotSupportedException: Unable to open socket file" and troubleshooting 
our own OpenJDK 17 jcmd setup on top of containers and Kubernetes.  Reading the 
code changes and discussion here, I'm concerned that this change, which I 
understand is not yet in OpenJDK 17, might cause a regression with our setup.

We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two 
separate containers in a Kubernetes pod.  The target JVM container is already 
running, and then we use `kubectl debug --target=...` to start a Kubernetes 
debug container with jcmd that targets the first container.  Given the 
`--target` option, they share the same Linux process namespace (both think the 
target JVM is PID 1).  But since they are separate containers, they see 
different root filesystems (jcmd container sees the target JVM tmpdir under 
/proc/1/root/tmp but has its own distinct /tmp directory).  

I believe the attach file and socket file paths then work like this in OpenJDK 
17:
* jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd
* Target JVM finds the .attach_pid1 attach file in its cwd.
* Target JVM creates the .java_pid1 socket file in its tmpdir /tmp
* jcmd finds the .java_pid1 socket file in /proc/1/root/tmp

I think this scenario with a Kubernetes debug container may be a little 
different from other Docker container scenarios because these are two different 
containers with _different root filesystems_ but _the same Linux process 
namespace_.  So jcmd using `/proc//root` is necessary to find the socket 
file, even though jcmd and the target JVM both agree the PID is the same (1).  
A similar scenario with just Docker Engine is described at [docker container 
run - Example, join another container's PID 
namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace).

If I understand the code change for this PR, I think it will change the 
behavior in this scenario, because `findSocketFile` will have `pid == ns_pid`, 
and now will use /tmp instead of `/proc//root/tmp`, based on 
`findTargetProcessTmpDirectory`.

We are lucky currently that the only place the current OpenJDK 17 code checks 
`pid == ns_pid` is the `createAttachFile` catch block that runs if 
`/proc//cwd/.attach_pid` can't be created, since as long as 
`/proc//cwd` works, we are fine.  But the `pid != ns_pid` check there 
makes an assumption that if the process namespace is the same, the root 
filesystem is the same and /tmp can be used, so this catch block wouldn't work 
if we were to hit it in our scenario.  I think propagating this catch block 
logic into `findSocketFile` will break our scenario - it will force using 
`/tmp/.java_pid` and that won't work.

Could the `findSocketFile` logic be made more robust to the different 
namespace/filesystem scenarios?  E.g. attempt `/proc//root` first?  Or 
perhaps there is a way (not `pid != ns_pid`) to more accurately determine 
whether / and `/proc//root` are the same filesystem and /tmp is OK?

Thanks for your time!

-

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1969769654