Re: [PR] [fix](load) fix Arrow stream load with uppercase columns [doris]

via GitHub Wed, 01 Jul 2026 08:42:16 -0700


github-actions[bot] commented on code in PR #65127:
URL: https://github.com/apache/doris/pull/65127#discussion_r3507270606



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsDataDescription.java:
##########
@@ -990,9 +990,10 @@ private void checkLoadPriv(String fullDbName) throws 
AnalysisException {
         }
     }
 
-    // Change all the columns name to lower case, because Doris column is 
case-insensitive.
+    // Change text-format column names to lower case, because Doris column is 
case-insensitive.
     private void columnsNameToLowerCase(List<String> columns) {
-        if (columns == null || columns.isEmpty() || 
"json".equals(analysisMap.get(FileFormatProperties.PROP_FORMAT))) {
+        String format = analysisMap.get(FileFormatProperties.PROP_FORMAT);
+        if (columns == null || columns.isEmpty() || "json".equals(format) || 
"arrow".equals(format)) {

Review Comment:
   This still leaves the broker LOAD path lowercasing Arrow source names even 
when the SQL uses lowercase `FORMAT AS arrow`. After this guard preserves 
`fileFieldNames`, `BulkLoadJob.checkAndSetDataSourceInfoByNereids()` calls 
`toDataDescription()`, and that constructs the legacy `DataDescription` with 
the same list. The legacy constructor immediately calls its own 
`columnsNameToLowerCase()`, which still skips only JSON, so `EV` is stored in 
`BrokerFileGroup` as `ev`. `LoadLoadingTask` later converts that legacy group 
back into `NereidsBrokerFileGroup`, so the scan slot is planned as `ev`; BE 
`ArrowStreamReader` compares the block column name with the Arrow schema field 
name exactly and returns `Column name mismatch` for Arrow field `EV`. Please 
carry the Arrow preservation through the legacy broker-load conversion, or 
avoid that re-lowercasing path.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/load/NereidsLoadScanProvider.java:
##########
@@ -370,8 +371,14 @@ private void 
fillContextExprMap(List<NereidsImportColumnDesc> columnDescList, Ne
                 Column slotColumn;
                 TFileFormatType fileFormatType = 
fileGroup.getFileFormatProperties().getFileFormatType();
                 // Use real column type for arrow/native format, other formats 
read as varchar first
-                if (fileFormatType == TFileFormatType.FORMAT_ARROW
-                        || fileFormatType == TFileFormatType.FORMAT_NATIVE) {
+                if (fileFormatType == TFileFormatType.FORMAT_ARROW) {

Review Comment:
   This lets an explicit Arrow source name like `ev` resolve against table 
column `EV` for the scan slot type, but fixed partial-update metadata was 
already built with exact case matching in `NereidsStreamLoadPlanner` 
(`importColumnDesc.getColumnName().equals(col.getName())`). For a MoW 
unique-key table using fixed partial update with `columns: id,ev`, planning now 
accepts `ev` and creates a DOUBLE slot, but `partialUpdateInputColumns` 
contains only `id`. `OlapTableSink` sends that set to BE, and 
`PartialUpdateInfo::init()` checks exact tablet column names, so `EV` is 
treated as missing instead of updated. Please apply the same case-insensitive 
table-column resolution when collecting fixed partial-update input columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [fix](load) fix Arrow stream load with uppercase columns [doris]

Reply via email to