Ma77Ball opened a new pull request, #5647:
URL: https://github.com/apache/texera/pull/5647

   ### What changes were proposed in this PR?
   - `LakeFSStorageClient.healthCheck()` now retries the LakeFS health check 
(10 attempts, 3s apart) before failing, so a transient `Connection reset` 
during concurrent startup no longer crashes the file-service.
   - Each failed attempt logs a warning; if LakeFS is still unreachable after 
all attempts, startup fails with a clear aggregated error so a genuine outage 
is still surfaced.
   ### Any related issues, documentation, discussions?
   Closes: #5646
   ### How was this PR tested?
   - Reproduce the race: stop LakeFS, start file-service, then start LakeFS 
within ~30s; confirm file-service logs retry warnings and then starts 
successfully instead of exiting.
   - Regression with LakeFS already up: start file-service, confirm it binds 
`:9092` and `curl http://127.0.0.1:9092/api/healthcheck` returns HTTP 200.
   - Compile: run `sbt "FileService/compile"` and expect `[success]`.
   ### Was this PR authored or co-authored using generative AI tooling?
   Co-authored with Claude Opus 4.8 in compliance with ASF


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to