dingyufei615 commented on code in PR #351:
URL: 
https://github.com/apache/doris-spark-connector/pull/351#discussion_r2758003359


##########
spark-doris-connector/spark-doris-connector-base/src/main/java/org/apache/doris/spark/client/read/RowBatch.java:
##########
@@ -145,19 +145,24 @@ public RowBatch(ArrowReader reader, Schema schema, 
Boolean datetimeJava8ApiEnabl
 
     private void readBatch(VectorSchemaRoot root) throws DorisException {
         fieldVectors = root.getFieldVectors();
-        if (fieldVectors.size() > schema.size()) {
-            logger.error("Data schema size '{}' should not be bigger than 
arrow field size '{}'.",
-                    schema.size(), fieldVectors.size());
+        if (fieldVectors.size() < schema.size()) {
+            logger.error("Arrow field size '{}' is less than data schema size 
'{}'.",
+                    fieldVectors.size(), schema.size());

Review Comment:
   Thank you for the review
   
   ## The fix handles both scenarios safely:
   
   **Scenario 1 (Issue #349)**: Arrow has extra internal columns
   - `fieldVectors.size() > schema.size()`
   - **Behavior**: Log warning, continue processing
   -  Fixes the reported issue
   
   **Scenario 2 (Your concern)**: Schema has columns missing in Arrow data
   - `fieldVectors.size() < schema.size()`
   - **Behavior**: Throw exception immediately (line 148-152)
   - Maintains fail-fast behavior
   
   ## Code logic:
   
   ```java
   // Still throws exception when Arrow data is missing expected columns
   if (fieldVectors.size() < schema.size()) {
       throw new DorisException("Load Doris data failed, schema size of fetch 
data is wrong.");
   }
   
   // Only allows extra columns (Doris 2.0+ internal columns)
   if (fieldVectors.size() > schema.size()) {
       logger.warn("This may be due to internal columns in Doris 2.0+...");
   }
   ```
   
   Could you share which Doris version had the issue you mentioned? I'd like to 
verify if it still exists and add test coverage if needed.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to