Copilot commented on code in PR #12079:
URL: https://github.com/apache/gluten/pull/12079#discussion_r3285250565


##########
cpp-ch/local-engine/Parser/SparkRowToCHColumn.cpp:
##########
@@ -64,7 +128,10 @@ ALWAYS_INLINE static void writeRowToColumns(const 
std::vector<MutableColumnPtr>
                 columns[i]->insert(spark_row_reader.getField(i)); // read 
decimal128
         }
         else
-            columns[i]->insert(spark_row_reader.getField(i));
+        {
+            DB::Field field = spark_row_reader.getField(i);
+            columns[i]->insert(normalizeFieldForType(std::move(field), 
spark_row_reader.getFieldTypes()[i]));
+        }

Review Comment:
   `normalizeFieldForType` walks all nested elements for arrays/maps/tuples 
before calling `Column::insert`, which will traverse the same structure again 
during insertion. For large nested values this adds an extra full pass per row. 
Consider limiting normalization to the cases that can actually contain illegal 
NULLs (e.g., only when the destination type contains non-nullable nested 
types), or moving the NULL-to-default substitution into the reader/column 
construction path to avoid double traversal.



##########
cpp-ch/local-engine/Storages/SubstraitSource/ParquetFormatFile.cpp:
##########
@@ -221,7 +221,7 @@ ParquetFormatFile::createInputFormat(const Block & header, 
const std::shared_ptr
 
         // TODO: rebase-25.12, support complex types when there is a nullable 
type
         // for example: parquet type is Array, requested type is 
Nullable(Array(Nullable(String)))
-        if (format_settings.parquet.use_native_reader_v3 && !readRowIndex && 
onlyFlatType)
+        if (format_settings.parquet.use_native_reader_v3 && !readRowIndex)

Review Comment:
   The TODO above this condition says "support complex types when there is a 
nullable type", but this PR now enables Parquet reader v3 regardless of 
`onlyFlatType`. To avoid confusion, please update/remove the TODO (or adjust it 
to reflect the remaining unsupported cases, if any).



##########
cpp-ch/local-engine/Parser/RelParsers/ReadRelParser.cpp:
##########
@@ -201,7 +201,7 @@ QueryPlanStepPtr 
ReadRelParser::parseReadRelWithLocalFile(const substrait::ReadR
     auto source = std::make_shared<SubstraitFileSource>(getContext(), header, 
local_files);
     auto source_pipe = Pipe(source);
     auto source_step = std::make_unique<SubstraitFileSourceStep>(getContext(), 
std::move(source_pipe), "substrait local files");
-    if (format_settings.parquet.use_native_reader_v3 && !readRowIndex && 
onlyFlatType)
+    if (format_settings.parquet.use_native_reader_v3 && !readRowIndex)
         source_step->setStepDescription("ParquetReaderV3");

Review Comment:
   After removing `onlyFlatType` from this v3-reader gate, the earlier 
`onlyFlatType = onlyHasFlatType(header);` in this function becomes unused dead 
code (and can trigger unused-variable warnings). Either remove `onlyFlatType` 
entirely or keep it as part of the condition if it still serves a purpose.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to