github-actions[bot] commented on code in PR #65127:
URL: https://github.com/apache/doris/pull/65127#discussion_r3507270606
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsDataDescription.java:
##########
@@ -990,9 +990,10 @@ private void checkLoadPriv(String fullDbName) throws
AnalysisException {
}
}
- // Change all the columns name to lower case, because Doris column is
case-insensitive.
+ // Change text-format column names to lower case, because Doris column is
case-insensitive.
private void columnsNameToLowerCase(List<String> columns) {
- if (columns == null || columns.isEmpty() ||
"json".equals(analysisMap.get(FileFormatProperties.PROP_FORMAT))) {
+ String format = analysisMap.get(FileFormatProperties.PROP_FORMAT);
+ if (columns == null || columns.isEmpty() || "json".equals(format) ||
"arrow".equals(format)) {
Review Comment:
This still leaves the broker LOAD path lowercasing Arrow source names even
when the SQL uses lowercase `FORMAT AS arrow`. After this guard preserves
`fileFieldNames`, `BulkLoadJob.checkAndSetDataSourceInfoByNereids()` calls
`toDataDescription()`, and that constructs the legacy `DataDescription` with
the same list. The legacy constructor immediately calls its own
`columnsNameToLowerCase()`, which still skips only JSON, so `EV` is stored in
`BrokerFileGroup` as `ev`. `LoadLoadingTask` later converts that legacy group
back into `NereidsBrokerFileGroup`, so the scan slot is planned as `ev`; BE
`ArrowStreamReader` compares the block column name with the Arrow schema field
name exactly and returns `Column name mismatch` for Arrow field `EV`. Please
carry the Arrow preservation through the legacy broker-load conversion, or
avoid that re-lowercasing path.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -370,8 +371,14 @@ private void
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
Column slotColumn;
TFileFormatType fileFormatType =
fileGroup.getFileFormatProperties().getFileFormatType();
// Use real column type for arrow/native format, other formats
read as varchar first
- if (fileFormatType == TFileFormatType.FORMAT_ARROW
- || fileFormatType == TFileFormatType.FORMAT_NATIVE) {
+ if (fileFormatType == TFileFormatType.FORMAT_ARROW) {
Review Comment:
This lets an explicit Arrow source name like `ev` resolve against table
column `EV` for the scan slot type, but fixed partial-update metadata was
already built with exact case matching in `NereidsStreamLoadPlanner`
(`importColumnDesc.getColumnName().equals(col.getName())`). For a MoW
unique-key table using fixed partial update with `columns: id,ev`, planning now
accepts `ev` and creates a DOUBLE slot, but `partialUpdateInputColumns`
contains only `id`. `OlapTableSink` sends that set to BE, and
`PartialUpdateInfo::init()` checks exact tablet column names, so `EV` is
treated as missing instead of updated. Please apply the same case-insensitive
table-column resolution when collecting fixed partial-update input columns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]