morningman opened a new pull request, #64769:
URL: https://github.com/apache/doris/pull/64769

   ## Proposed changes
   
   Fixes two intermittent `branch-4.0` P0 regression-test failures seen on the 
ASAN multi-FE pipeline (e.g. CI build 199345): 
`load_p0/http_stream/test_http_stream_properties` and the `nereids_p0/cache` 
sql-cache cases.
   
   ### 1. `[fix](stream-load)` http_stream schema-inference truncation for 
compressed files
   
   A compressed (gz/bz2) load through the `http_stream` TVF intermittently 
failed with:
   
   ```
   [INTERNAL_ERROR]Compressed file has been truncated, which is not allowed
   ```
   
   with the BE stack `fetch_table_schema` → `CsvReader::get_parsed_schema` → 
`CsvReader::_parse_col_nums` → `NewPlainTextLineReader::read_line`.
   
   **Root cause:** the table schema is sniffed from the first part of the 
request body buffered in `schema_buffer`. `HttpStreamAction::on_chunk_data` is 
a per-chunk libevent callback, and its end-of-callback block fired 
`process_put` (which triggers FE schema inference over `schema_buffer`) 
whenever `is_read_schema` was still set — i.e. at the end of *every* callback — 
treating "the current evbuffer is drained" as "the whole body has been read". 
When the body spans multiple chunk callbacks (common under load), schema 
inference runs on a partial buffer. For uncompressed data a partial buffer is 
harmless, but a partial compressed buffer is an incomplete compressed stream, 
so decompression hits EOF before the stream end and reports truncation.
   
   **Fix:** only trigger schema inference once the whole body has been received 
— gate the `on_chunk_data` trigger on `receive_bytes >= body_bytes` 
(Content-Length known), and trigger it at request completion in `_handle` for 
the chunked / unknown-length case. The `>= 1MB` path is unchanged.
   
   ### 2. `[test](sql-cache)` skip sql cache cases when the connected FE is not 
master
   
   `mv_with_sql_cache`, `mtmv_with_sql_cache`, `parse_sql_from_sql_cache` 
assert that the sql cache is invalidated right after `RENAME ROLLUP` / `MODIFY 
COLUMN` / `ADD PARTITION`. That invalidation only happens locally on the FE 
that executes the DDL; on a follower FE the sql cache is not invalidated on 
metadata replay (the fix #63612 was reverted by #63872), so the `assertNoCache` 
checks are flaky when the suite happens to connect to a follower FE. The cases 
are now skipped unless the connected FE is the master FE.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   
   https://claude.ai/code/session_01LJWGGEQq3sx3m1tssKVBeR
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to