SinMayFly opened a new issue, #91:
URL: https://github.com/apache/doris-mcp-server/issues/91
## Bug Description
When running `doris-mcp-server` in Docker with HTTP transport and token
authentication enabled, the MCP server works initially but becomes unusable
after running for some time.
After several query timeouts, later tool calls start failing with:
```text
Failed to acquire connection from token pool:
Failed to get connection for authenticated token. This is a security measure
to prevent using default high-privilege credentials. Error:
```
The server process itself is still alive and `/health` may still respond,
but MCP tool calls cannot reliably acquire Doris connections anymore.
## Environment
- Deployment: Docker image
- Transport: Streamable HTTP
- Auth: token auth enabled
- Database: Apache Doris
- Token-bound database config: enabled
- Server version: v0.6.x based on the current Docker image/release
## Observed Logs
The sequence observed in production logs is:
```text
Query execution failed: Query timeout after 30 seconds
Session mcp_session: Failed to acquire connection from token pool:
Session mcp_session: Token pool error:
Query execution failed for session mcp_session: Failed to get connection for
authenticated token. This is a security measure to prevent using default
high-privilege credentials. Error:
```
This pattern repeats after the first few timeouts. There are also frequent
MCP ping requests:
```text
Processing request of type PingRequest
Database configuration validated successfully for token ... (source:
token-bound)
```
## Expected Behavior
A query timeout should not corrupt, occupy, or exhaust the token-specific
connection pool. Later requests should still be able to acquire healthy
connections.
## Suspected Cause
`DorisQueryExecutor._execute_query_internal()` wraps the whole DB execution
with `asyncio.wait_for()`.
When a timeout occurs, the coroutine is cancelled, but the underlying
MySQL/Doris query may still be running or the aiomysql connection may be left
in an unsafe state.
`DorisConnectionManager.execute_query()` then releases the same connection
back to the token pool in `finally`, which may return a cancelled, busy, or
broken connection to the pool. Over time this appears to exhaust or poison the
token pool.
Relevant code areas:
- `doris_mcp_server/utils/query_executor.py`
- `asyncio.wait_for(self.connection_manager.execute_query(...),
timeout=...)`
- `doris_mcp_server/utils/db.py`
- `execute_query(...)`
- `release_connection_for_token(...)`
## Suggested Fix
1. When query execution is cancelled or times out, close/discard the
underlying DB connection instead of releasing it back to the pool.
2. Improve acquire timeout logging by including exception type, since
`asyncio.TimeoutError` string is empty.
3. Avoid validating token-bound database connectivity on every MCP ping
request. Add a short TTL cache for token DB config validation.
4. Consider reducing or sanitizing request header logs, especially
`Authorization`.
## Additional Compatibility Issue
In the same environment, resource metadata queries may fail with:
```text
Unknown column 'table_comment' in 'table list'
```
The query against `information_schema.tables.table_comment` may not be
compatible with all Doris versions/configurations. A fallback or compatible
metadata query would help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]