zclllyybb commented on issue #64494:
URL: https://github.com/apache/doris/issues/64494#issuecomment-4704076475

   Breakwater-GitHub-Analysis-Slot: slot_498e1c5a18e8
   
   Initial analysis: the reported failure is credible on current 
`upstream/master` and the stack matches a row-binlog schema construction 
mismatch in the BE unit-test path. I checked refreshed `upstream/master` at 
`1316d4bd633`.
   
   What the code shows:
   - `RowBinlogSegmentWriter::_fill_binlog_columns()` builds a block for the 
row-binlog prefix columns and unconditionally casts the timestamp column to 
`ColumnNullable` before inserting the NULL placeholder.
   - `TabletSchema::create_block_by_cids()` creates that column from 
`DataTypeFactory::create_data_type(TabletColumn)`, so the runtime column is 
nullable only if the tablet schema column has `is_nullable = true`.
   - The production FE row-binlog schema construction makes 
`__DORIS_BINLOG_OP__` and `__DORIS_BINLOG_TIMESTAMP__` nullable.
   - The failing UT does not go through that FE schema path. 
`GroupRowsetWriterTest` uses `testutil::enable_row_binlog()`, and the current 
test helper's `CreateTabletRequestColumnDef` has no nullable field; 
`create_tablet_column()` does not set `TColumn.is_allow_null`. Later 
`TabletMeta::init_column_from_tcolumn()` copies `tcolumn.is_allow_null` into 
`ColumnPB.is_nullable`, so the test-created timestamp column becomes a plain 
non-null BIGINT vector. That explains the fatal cast from 
`ColumnVector<TYPE_BIGINT>` to `ColumnNullable`.
   
   Conclusion:
   - This looks like a real master UT bug, but the strongest evidence points to 
the BE test schema helper not matching the intended row-binlog schema 
semantics, rather than proving that production FE-created row-binlog schemas 
have a non-null timestamp column.
   - The crash was probably exposed after the previous row-binlog special-index 
fix because the writer can now reach `_fill_binlog_columns()`.
   
   Recommended next steps:
   - Prefer fixing the test/schema construction path so the row-binlog 
timestamp column is nullable, for example by adding nullable support to 
`CreateTabletRequestColumnDef` / `create_tablet_column()` or explicitly setting 
`is_allow_null` for `__DORIS_BINLOG_OP__` and `__DORIS_BINLOG_TIMESTAMP__` in 
`testutil::enable_row_binlog()`.
   - Add regression assertions in `GroupRowsetWriterTest` or the row-binlog 
metadata test that the row-binlog schema has valid LSN/timestamp special 
indexes and that `__DORIS_BINLOG_TIMESTAMP__` is nullable, not only that the 
column name exists.
   - A defensive writer-side fallback using 
`check_and_get_column<ColumnNullable>()` would likely unblock this exact UT 
because the reader later replaces the TSO placeholder with commit TSO and 
already handles nullable/non-nullable output. However, I would not make that 
the only fix unless maintainers intentionally want to tolerate older or 
malformed row-binlog schemas; otherwise it can hide a schema invariant 
violation that the writer comments currently assume.
   
   Missing information to fully close this:
   - The exact PR/commit for the prior "binlog column index" fix mentioned in 
the issue.
   - Whether this is reproducible only through 
`GroupRowsetWriterTest.sub_writer_rollback`, or also with a real FE-created 
row-binlog table. If it reproduces outside BE UT, please provide the exact 
commit SHA, table DDL/binlog properties, tablet meta/schema dump for the 
row-binlog tablet, and the full BE fatal log/core stack.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to