janhoy opened a new pull request, #4471:
URL: https://github.com/apache/solr/pull/4471

   Implements 
[SIP-24](https://cwiki.apache.org/confluence/display/SOLR/SIP-24%3A+Java+Security+Manager+replacement)
 — a drop-in replacement for the JVM Security Manager (removed in JDK 24) using 
a Java agent built on ByteBuddy `@Advice` interceptors.
   
   **Default mode is `warn`-only.** No behaviour changes for existing 
deployments unless you explicitly opt in to `enforce` mode.
   
   ## What this adds
   
   A new Gradle subproject `solr/agent-sm/` that builds a self-contained fat 
JAR. The startup scripts auto-detect and load it via `-javaagent:` if present 
in `solr/server/lib/ext/`. Four categories of protection:
   
   | Category | Intercepted JDK entry points |
   |----------|------------------------------|
   | File access | `FileSystemProvider` subclasses, `java.nio.file.Files`, 
`FileChannel` subtypes — read/write/delete/copy/move/create |
   | Network access | `SocketChannel.connect()`, `Socket.connect()` |
   | JVM exit | `System.exit()`, `Runtime.halt()` |
   | Process exec | `ProcessBuilder.start()`, `Runtime.exec()` |
   
   Violations are logged as `SECURITY VIOLATION [TYPE] target=… caller=… 
mode=…` and counted in `/admin/metrics` under the node registry 
(`security_agent_violations_{file,network,exit,exec}_total` in Prometheus 
format).
   
   Policy is read from `server/etc/agent-security.policy` (JDK-style `.policy` 
syntax). Variable substitution expands properies (e.g. `${solr.solr.home}`, 
`${solr.port.listen}`), with env-var fallback so `SOLR_HOME` etc. Could not use 
`EnvUtils` since it is in solrj which agent does not depend upon. Operators 
extend the policy via `server/etc/agent-security-extra.policy` or 
`SOLR_SECURITY_AGENT_EXTRA_POLICY`. Five modules with legitimate outbound 
network needs (`jwt-auth`, `opentelemetry`, `s3-repository`, `gcs-repository`, 
`cross-dc-manager`) are pre-permitted in the default policy with a wildcard 
socket grant.
   
   **NOTE:** The interceptors and policy parser/expander are forked and adapted 
from the [OpenSearch agent-sm 
project](https://github.com/opensearch-project/OpenSearch) (Apache 2.0). Added 
attribution in `NOTICE.txt` as well as mentioning them in `package-info.java` 
which will show up in javadocs.
   
   ## Code review guide
   
   ### Start here — the core loop
   
   1. **`SolrAgentEntryPoint.premain()`** — loads the policy and installs all 
interceptors. Each interceptor class is injected into the bootstrap classloader 
so it can redefine `java.base` methods.
   
   2. **`FileInterceptor`** — The `@Advice.OnMethodEnter` method is inlined 
into JDK bytecode, not called normally. The trusted-filesystem exemption 
short-circuits for non-`file:` providers (Lucene's mock in-memory FS); Unix 
domain socket writes return early.
   
   3. **`StackCallerClassChainExtractor`** — call-chain extraction via 
`StackWalker`. Used by `SystemExitInterceptor` and `RuntimeHaltInterceptor` to 
check whether the full call chain contains an approved exit caller.
   
   4. **`PolicyLoader.load()`** — two-file merge (default + optional operator 
extension), variable substitution via `PolicyPropertyExpander`, 
`codeBase`-scoped grants. Uses `PolicyFileParser`, borrowed from Opensearch.
   
   5. **`PolicyPropertyExpander.getPropertyOrEnv()`** — simple substitute for 
the fact that we cannot use `EnvUtils` in the agent since it is in solrj which 
is not imported.
   
   6. **`ViolationMetricsReporter.registerWithSolrMetrics()`** 
(`ViolationMetricsReporter.java`) — deferred registration pattern. Counters 
start accumulating from `premain` via `LongAdder`s; `CoreContainer` calls this 
method reflectively (no compile-time dep from `solr:core` on `solr:agent-sm`). 
Registration uses `SolrMetricManager.observableLongCounter()` via reflection, 
because the agent module cannot import OTel types at compile time.
   
   ### Wiring into Solr
   
   - **`bin/solr`** and **`bin/solr.cmd`** — Parses `SOLR_SECURITY_AGENT`. The 
scripts detect `solr-agent-sm-*.jar` in `lib/ext/` and prepend `-javaagent:` to 
`SOLR_OPTS`. The `SOLR_SECURITY_AGENT_SKIP=true` escape hatch is here.
   - **`solr/server/build.gradle`** — `libExt` dependency on `:solr:agent-sm` 
(transitive=false) places the agent JAR in `server/lib/ext/` in the packaged 
distribution.
   - **`CoreContainer.java`** — search for `ViolationMetricsReporter`. One 
reflective call to register OTel counters after `SolrMetricManager` is up.
   
   ### Tests
   
   **Unit tests** (`solr/agent-sm/src/test/`) runs with the JVM Security 
Manager active (`-Ptests.useSecurityManager=true`), which causes some few agent 
tests to be skipped.
   
   Notable unit test classes:
   - `SolrAgentIntegrationTest` — end-to-end in enforce mode with counter 
increment verification
   - `SymlinkEscapeTest` — symlink escape prevention (3 of 4 skip under 
SecurityManager)
   - `PolicyLoaderOperatorExtensionTest` — operator extension file merge, 
DEFAULT/OPERATOR source tagging
   - `SystemExitInterceptorTest` — exit call-chain approval logic
   - `SocketChannelInterceptorTest` — endpoint matching (host, port, codeBase 
wildcards)
   - `ProcessExecInterceptorTest` — exec approval logic
   
   **BATS integration tests** (`solr/packaging/test/test_security_agent.bats`) 
covering start script logic and realistic invocation, including a test of 
metrics appearing. There are some 7 BATS tests. Some test our start script 
behavior, and some just test the agent together withe a simple java class file 
designed to violate some policy:
   - **Test 1** — Agent is active by default in `WARN` mode and registers 
`security_agent_violations_file` / `security_agent_violations_network` metrics 
via `SolrMetricManager`.
   - **Test 2** — `SOLR_SECURITY_AGENT_SKIP=true` suppresses `-javaagent:` 
injection entirely; no "Security agent active" log line.
   - **Test 3** — `SOLR_SECURITY_AGENT_MODE=enforce` and 
`SOLR_SECURITY_AGENT_EXTRA_POLICY` are forwarded to the agent as system 
properties.
   - **Test 4** — Enforce mode blocks unauthorized file read (`/etc/hosts`) 
with `SecurityException`
   - **Test 5** — Enforce mode blocks `System.exit()` with `SecurityException`
   - **Test 6** — Enforce mode blocks outbound `SocketChannel.connect()` to 
`192.0.2.1:443` with `SecurityException`
   - **Test 7** — Enforce mode blocks `ProcessBuilder` exec with 
`SecurityException`
   
   The JSM is explicitly disabled in the BATS setup 
(`SOLR_SECURITY_MANAGER_ENABLED=false`) to test the agent's ByteBuddy 
interceptors in isolation. All BATS tests run Solr in standalone mode 
(`--user-managed`).
   
   ## Testing IRL
   
   **Prerequisites**: JDK 21+, Solr built from source on this branch.
   
   ```bash
   # Unit tests (60 tests, 3 skipped under SecurityManager)
   ./gradlew :solr:agent-sm:test
   
   # Full distribution build + BATS integration tests (4 tests)
   ./gradlew :solr:packaging:integrationTests --tests test_security_agent.bats
   
   # Start Solr in warn mode (default)
   bin/solr start
   
   # Check violation counters via metrics API (Prometheus format)
   curl http://localhost:8983/solr/admin/metrics
   # Look for: security_agent_violations_file_total, 
security_agent_violations_network_total, etc.
   
   # Opt in to enforce mode:
   SOLR_SECURITY_AGENT_MODE=enforce bin/solr start
   ```
   
   To test the extra-policy override (e.g. Tika Server in enforce mode):
   ```bash
   # Add to server/etc/agent-security-extra.policy:
   grant {
     permission java.net.SocketPermission "tika-host:9998", "connect,resolve";
   };
   ```
   
   Things worth eyeballing manually:
   - `server/etc/agent-security.policy` — the default policy grants, did we 
miss any? Check that the intra-cluster wildcards (`*:${solr.port.listen}`, 
`*:${solr.zk.port}`) look right
   - Run `./gradlew :solr:agent-sm:test` with 
`-Ptests.useSecurityManager=false` to see the full 60 tests (vs 57 when SM is 
on).
   
   ## What this does NOT do
   
   - No enforce mode in the broader Solr test suite yet (warn mode only; 
enforce-mode flip is tracked as a follow-up per SIP-24).
   - Does not replace `SolrPaths.assertPathAllowed()` call sites — that method 
is now `@Deprecated` but existing calls are left in place for this PR.
   
   https://issues.apache.org/jira/browse/SOLR-17767


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to