zclllyybb commented on issue #64494: URL: https://github.com/apache/doris/issues/64494#issuecomment-4704076475
Breakwater-GitHub-Analysis-Slot: slot_498e1c5a18e8 Initial analysis: the reported failure is credible on current `upstream/master` and the stack matches a row-binlog schema construction mismatch in the BE unit-test path. I checked refreshed `upstream/master` at `1316d4bd633`. What the code shows: - `RowBinlogSegmentWriter::_fill_binlog_columns()` builds a block for the row-binlog prefix columns and unconditionally casts the timestamp column to `ColumnNullable` before inserting the NULL placeholder. - `TabletSchema::create_block_by_cids()` creates that column from `DataTypeFactory::create_data_type(TabletColumn)`, so the runtime column is nullable only if the tablet schema column has `is_nullable = true`. - The production FE row-binlog schema construction makes `__DORIS_BINLOG_OP__` and `__DORIS_BINLOG_TIMESTAMP__` nullable. - The failing UT does not go through that FE schema path. `GroupRowsetWriterTest` uses `testutil::enable_row_binlog()`, and the current test helper's `CreateTabletRequestColumnDef` has no nullable field; `create_tablet_column()` does not set `TColumn.is_allow_null`. Later `TabletMeta::init_column_from_tcolumn()` copies `tcolumn.is_allow_null` into `ColumnPB.is_nullable`, so the test-created timestamp column becomes a plain non-null BIGINT vector. That explains the fatal cast from `ColumnVector<TYPE_BIGINT>` to `ColumnNullable`. Conclusion: - This looks like a real master UT bug, but the strongest evidence points to the BE test schema helper not matching the intended row-binlog schema semantics, rather than proving that production FE-created row-binlog schemas have a non-null timestamp column. - The crash was probably exposed after the previous row-binlog special-index fix because the writer can now reach `_fill_binlog_columns()`. Recommended next steps: - Prefer fixing the test/schema construction path so the row-binlog timestamp column is nullable, for example by adding nullable support to `CreateTabletRequestColumnDef` / `create_tablet_column()` or explicitly setting `is_allow_null` for `__DORIS_BINLOG_OP__` and `__DORIS_BINLOG_TIMESTAMP__` in `testutil::enable_row_binlog()`. - Add regression assertions in `GroupRowsetWriterTest` or the row-binlog metadata test that the row-binlog schema has valid LSN/timestamp special indexes and that `__DORIS_BINLOG_TIMESTAMP__` is nullable, not only that the column name exists. - A defensive writer-side fallback using `check_and_get_column<ColumnNullable>()` would likely unblock this exact UT because the reader later replaces the TSO placeholder with commit TSO and already handles nullable/non-nullable output. However, I would not make that the only fix unless maintainers intentionally want to tolerate older or malformed row-binlog schemas; otherwise it can hide a schema invariant violation that the writer comments currently assume. Missing information to fully close this: - The exact PR/commit for the prior "binlog column index" fix mentioned in the issue. - Whether this is reproducible only through `GroupRowsetWriterTest.sub_writer_rollback`, or also with a real FE-created row-binlog table. If it reproduces outside BE UT, please provide the exact commit SHA, table DDL/binlog properties, tablet meta/schema dump for the row-binlog tablet, and the full BE fatal log/core stack. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
