zclllyybb commented on issue #64483:
URL: https://github.com/apache/doris/issues/64483#issuecomment-4699555930
Breakwater-GitHub-Analysis-Slot: slot_c4e837598f90
Initial analysis: this looks like a real row-binlog metadata construction
bug on current upstream/master. I checked the refreshed `upstream/master` ref
at `4dda859061c`, and the proposed root cause is consistent with the code path.
Evidence:
- `testutil::enable_row_binlog()` builds the row-binlog schema by appending
`__DORIS_BINLOG_LSN__`, `__DORIS_BINLOG_OP__`, and `__DORIS_BINLOG_TIMESTAMP__`
after the normal columns. In the reported test case, the LSN column should
therefore be column id 2, and the timestamp column should be column id 4.
- `TabletMeta::init_schema_from_thrift()` converts `TTabletSchema` directly
into `TabletSchemaPB` by adding `ColumnPB`s. It propagates special indexes that
exist on `TTabletSchema`, such as `delete_sign_idx`, but there is no thrift
field for the row-binlog LSN/timestamp indexes, and this function does not
derive those indexes from the column names.
- `TabletMeta::init_from_pb()` then calls `TabletSchema::init_from_pb()` for
`row_binlog_schema`. That path assigns `_binlog_lsn_col_idx` and
`_binlog_timestamp_col_idx` from the protobuf fields, whose defaults are `-1`.
- `RowBinlogSegmentWriter::init()` hard-checks
`_tablet_schema->binlog_lsn_col_idx() >= 0`, so it aborts even though the
`__DORIS_BINLOG_LSN__` column exists in the schema's column list.
Recommended fix:
- In `TabletMeta::init_schema_from_thrift()`, while iterating
`tablet_schema.columns`, set `tablet_schema_pb->set_binlog_lsn_col_idx(...)`
and `tablet_schema_pb->set_binlog_timestamp_col_idx(...)` when
`tcolumn.column_name` matches `BINLOG_LSN_COL` and `BINLOG_TIMESTAMP_COL`.
Alternatively, route this conversion through the normal `TabletSchema`
append/to-pb path so all special-column indexes are derived consistently.
- Add a regression assertion that the created row-binlog tablet schema has
`binlog_lsn_col_idx() == 2` and `binlog_timestamp_col_idx() == 4`, not only
that `field_index("__DORIS_BINLOG_LSN__") >= 0`. `field_index` can pass while
the persisted special-column index remains `-1`.
- Keep `GroupRowsetWriterTest.sub_writer_rollback` as the end-to-end guard;
after the metadata fix it should no longer abort in
`RowBinlogSegmentWriter::init()`.
Missing information:
- For this BE unit-test failure, the issue already contains enough
information to identify the bug.
- If the same failure is observed outside BE UT, please also provide the
exact Doris commit SHA, the row-binlog table DDL/binlog properties, and the BE
fatal log/core stack from that environment. If any tablet meta was created by
the broken code before the fix, maintainers should also consider whether
`TabletSchema::init_from_pb()` needs a compatibility fallback that derives
missing row-binlog indexes from column names when the protobuf fields are still
`-1`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]