github-actions[bot] commented on code in PR #63192:
URL: https://github.com/apache/doris/pull/63192#discussion_r3229492658


##########
be/src/format/table/hive/hive_parquet_nested_column_utils.cpp:
##########
@@ -18,20 +18,176 @@
 #include "format/table/hive/hive_parquet_nested_column_utils.h"
 
 #include <algorithm>
+#include <cctype>
 #include <memory>
 #include <set>
 #include <string>
+#include <string_view>
 #include <unordered_map>
 #include <vector>
 
 #include "format/parquet/schema_desc.h"
 #include "format/table/table_schema_change_helper.h"
 
 namespace doris {
+namespace {
+
+void add_column_id_range(const FieldSchema& field_schema, std::set<uint64_t>& 
column_ids) {
+    const uint64_t start_id = field_schema.get_column_id();
+    const uint64_t max_column_id = field_schema.get_max_column_id();
+    for (uint64_t id = start_id; id <= max_column_id; ++id) {
+        column_ids.insert(id);
+    }
+}
+
+const FieldSchema* find_child_by_name(const FieldSchema& field_schema, 
std::string_view name) {
+    std::string lower_name(name);
+    std::transform(lower_name.begin(), lower_name.end(), lower_name.begin(),
+                   [](unsigned char c) { return 
static_cast<char>(std::tolower(c)); });
+    for (const auto& child : field_schema.children) {
+        if (child.name == name || child.lower_case_name == lower_name) {

Review Comment:
   This makes shredded VARIANT path pruning case-insensitive for user object 
keys. Doris VARIANT lookups distinguish `v['a']` from `v['A']` (there are 
existing regression cases for that), and the Parquet VARIANT spec also says 
object field names are case-sensitive. With this match, a query for `v['A']` 
will select `typed_value.a` when only lower-case `a` is shredded, instead of 
treating `A` as a different key/falling back to `value`; the reader can return 
the wrong field while also pruning away the only column that could contain `A`. 
Please match VARIANT path components exactly; if needed, keep special handling 
only for the structural Parquet fields `metadata`, `value`, and `typed_value`.



##########
be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp:
##########
@@ -18,21 +18,177 @@
 #include "format/table/iceberg/iceberg_parquet_nested_column_utils.h"
 
 #include <algorithm>
+#include <cctype>
 #include <iostream>
 #include <memory>
 #include <set>
 #include <string>
+#include <string_view>
 #include <unordered_map>
 #include <vector>
 
 #include "format/parquet/schema_desc.h"
 #include "format/table/table_schema_change_helper.h"
 
 namespace doris {
+namespace {
+
+void add_column_id_range(const FieldSchema& field_schema, std::set<uint64_t>& 
column_ids) {
+    const uint64_t start_id = field_schema.get_column_id();
+    const uint64_t max_column_id = field_schema.get_max_column_id();
+    for (uint64_t id = start_id; id <= max_column_id; ++id) {
+        column_ids.insert(id);
+    }
+}
+
+const FieldSchema* find_child_by_name(const FieldSchema& field_schema, 
std::string_view name) {
+    std::string lower_name(name);
+    std::transform(lower_name.begin(), lower_name.end(), lower_name.begin(),
+                   [](unsigned char c) { return 
static_cast<char>(std::tolower(c)); });
+    for (const auto& child : field_schema.children) {
+        if (child.name == name || child.lower_case_name == lower_name) {

Review Comment:
   Same case-sensitivity issue in the Iceberg path: VARIANT object keys are 
case-sensitive, but this helper lowercases the requested path and can map 
`v['A']` to a shredded `typed_value.a` column. That changes query results and 
can prune away the fallback `value` column that would contain the distinct `A` 
key. Please use exact matching for user VARIANT key path components, with any 
case-insensitive matching limited to the fixed Parquet structural field names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to