Re: [PR] Hive: Return new scan after applying column project parameter [iceberg]

via GitHub Tue, 11 Jun 2024 09:23:52 -0700


pvary commented on code in PR #10449:
URL: https://github.com/apache/iceberg/pull/10449#discussion_r1635167684



##########
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -125,11 +125,9 @@ public List<InputSplit> getSplits(JobContext context) {
     }
     String schemaStr = conf.get(InputFormatConfig.READ_SCHEMA);
     if (schemaStr != null) {
-      scan.project(SchemaParser.fromJson(schemaStr));
-    }
-    String[] selectedColumns = 
conf.getStrings(InputFormatConfig.SELECTED_COLUMNS);
-    if (selectedColumns != null) {
-      scan.select(selectedColumns);
+      scan = scan.project(SchemaParser.fromJson(schemaStr));
+    } else if (conf.getStrings(InputFormatConfig.SELECTED_COLUMNS) != null) {

Review Comment:
   This is a questionable decision in my mind.
   We allow the user to set columns and schema, but behind the scenes we decide 
to use only one of them?
   
   If this would be a new code, I would ask for a validation, and we should 
throw an exception if both of them are set.
   OTOH it is a bit questionable what we should do now, as some users might 
expect to get away with setting both, as they were ignored before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Hive: Return new scan after applying column project parameter [iceberg]

Reply via email to