nsivabalan opened a new pull request, #19060:
URL: https://github.com/apache/hudi/pull/19060
**DO NOT MERGE — temporary triage branch.**
Recent Azure CI runs on #18147 and #18650 fail on the same Spark 4.0
java-tests-part2 job with:
```
Caused by: java.net.BindException: Address already in use
Caused by: org.apache.hudi.exception.HoodieLockException: Failed to connect
to ZooKeeper within 10000 ms
at
org.apache.hudi.client.transaction.lock.BaseZookeeperBasedLockProvider.<init>(BaseZookeeperBasedLockProvider.java:86)
```
Neither PR touches lock providers, ZK, or the test harness — strongly
suggests a runner-resource / port-bind flake in the Curator `TestingServer`
used inside
`TestHoodieClientMultiWriter.testHoodieClientBasicMultiWriterWithEarlyConflictDetectionDirect`.
This branch is opened against `apache/hudi:master` purely so the Apache
Azure CI pipeline runs on it. Contents:
- `.github/workflows/bot.yml` — disabled (workflow_dispatch only) so the GA
matrix does not consume runners during triage.
- `azure-pipelines-20230430.yml` — stripped from 10 jobs to a single job
that builds `hudi-spark-datasource/hudi-spark -am -DskipTests` then runs `mvn
test -Dtest=TestHoodieClientMultiWriter` only. Pre- and post-test diagnostic
steps print `ip_local_port_range`, `ss -tlnp`, `ulimit -a` so port-bind
contention is visible in the Azure log.
- `TestHoodieClientMultiWriter.java` — wraps the single `new
TestingServer()` call in `startTestingServerWithDiagnostics()` which logs bound
port, bind latency, JVM PID, hostname, and retries up to 5× on nested
`BindException`.
Will close once we have the diagnostic data.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]