eldenmoon commented on code in PR #63182:
URL: https://github.com/apache/doris/pull/63182#discussion_r3228967980


##########
be/src/format/json/new_json_reader.cpp:
##########
@@ -1138,8 +1158,23 @@ Status 
NewJsonReader::_simdjson_write_data_to_column(simdjson::ondemand::value&
         }
     }
 
-    auto primitive_type = type_desc->get_primitive_type();
     if (_is_load || !is_complex_type(primitive_type)) {
+        if (is_flexible_variant_patch_column && primitive_type == 
TYPE_VARIANT) {
+            ParseConfig parse_config;

Review Comment:
   这里保留在 JSON reader 中的逻辑只做 flexible VARIANT patch 的输入边界处理:只接受 JSON object,拒绝 
null/scalar/array,并用 record_empty_object_path 保留 `{}` path。真正依赖旧值的 path 
marker、merge、publish replay 仍在 storage 层。完全移动到 storage 需要让 reader/serde 额外传递 
raw JSON object,侵入更大;当前 v1 选择这个更小的边界。



##########
be/src/storage/segment/vertical_segment_writer.cpp:
##########
@@ -736,13 +736,48 @@ Status 
VerticalSegmentWriter::_append_block_with_flexible_partial_content(RowsIn
         RETURN_IF_ERROR(_create_column_writer(cid, 
_tablet_schema->column(cid), _tablet_schema));
     }
 
+    std::vector<BitmapValue>* skip_bitmaps = &(

Review Comment:
   当前耦合已经收敛到 variant_util:writer 只收集 VARIANT cid 并调用 
parse_and_materialize_variant_columns / mark_variant_patch_paths,path 解析和 
marker 规则都在 helper 内部。继续抽整段 writer 流程会让 partial update 写入步骤更不直观,所以这里保持局部编排。



##########
be/src/storage/segment/vertical_segment_writer.cpp:
##########
@@ -938,7 +969,22 @@ Status VerticalSegmentWriter::_generate_flexible_read_plan(
                     &skip_bitmap);
         };
         auto update_read_plan = [&](const RowLocation& loc) {
-            read_plan.prepare_to_read(loc, segment_pos, skip_bitmap);
+            BitmapValue read_skip_bitmap(skip_bitmap);

Review Comment:
   这里的关键语义是:如果行声明了 VARIANT patch,read_skip_bitmap 会临时把该 VARIANT 顶层列标成需要读取旧值;这样 
full row 生成时可以用 skip bitmap 里的 path marker 只重放原始 patch path,避免并发 publish 时把未声明 
path 覆盖回去。



##########
be/src/storage/tablet/base_tablet.cpp:
##########
@@ -1131,8 +1136,19 @@ static void fill_cell_for_flexible_partial_update(
             new_col->insert_from(old_value_col, 
read_index_old[cast_set<uint32_t>(idx)]);
         }
     } else {
+        bool old_row_delete_sign =
+                (delete_sign_column_data != nullptr &&
+                 
delete_sign_column_data[read_index_old[cast_set<uint32_t>(idx)]] != 0);
+        if (tablet_column.is_variant_type()) {

Review Comment:
   这里进入 VARIANT 分支时,cur_col 是当前事务 flush 出来的 full value,但 skip_bitmap 里保存的是原始 
patch path marker;publish conflict 读到最新 old row 后,只把这些 marker 对应的 path 从 
cur_col replay 到 old row 上。old row 是 delete sign 时按新插入处理,不复活未声明 path。



##########
be/src/storage/partial_update_info.cpp:
##########
@@ -625,8 +640,18 @@ static void fill_non_primary_key_cell_for_column_store(
             new_col->insert_from(old_value_col, pos_in_old_block);
         }
     } else {
+        if (tablet_column.is_variant_type() && !use_default &&

Review Comment:
   这个 column-store 分支里 VARIANT 和普通列不同:普通列的非 skipped cell 可以直接使用当前值;VARIANT 
的当前值在 flush 时已经是基于当时旧行生成的 full value,并发 publish 需要重新合并到最新 old row,所以这里必须在存在 old 
row 且非 delete sign 时调用 merge_variant_patch。



##########
be/src/storage/partial_update_info.cpp:
##########
@@ -1164,6 +1280,8 @@ Status 
BlockAggregator::aggregate_for_flexible_partial_update(
         RETURN_IF_ERROR(aggregate_for_sequence_column(block, 
static_cast<int>(num_rows),
                                                       key_columns, seq_column, 
specified_rowsets,
                                                       segment_caches));
+    } else {
+        RETURN_IF_ERROR(aggregate_without_sequence_column(block, num_rows, 
key_columns));

Review Comment:
   VARIANT 在 flexible partial update 下不能用 replace_load 的整列覆盖语义,否则同 batch 的 
`v.a` 和 `v.b` patch 会互相丢 path。这里保留特殊处理是为了让 BlockAggregator 按 path merge;如果去掉,重复 
key 聚合会退化为整列覆盖。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to