pvary commented on code in PR #10449:
URL: https://github.com/apache/iceberg/pull/10449#discussion_r1635167684
##########
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java:
##########
@@ -125,11 +125,9 @@ public List<InputSplit> getSplits(JobContext context) {
}
String schemaStr = conf.get(InputFormatConfig.READ_SCHEMA);
if (schemaStr != null) {
- scan.project(SchemaParser.fromJson(schemaStr));
- }
- String[] selectedColumns =
conf.getStrings(InputFormatConfig.SELECTED_COLUMNS);
- if (selectedColumns != null) {
- scan.select(selectedColumns);
+ scan = scan.project(SchemaParser.fromJson(schemaStr));
+ } else if (conf.getStrings(InputFormatConfig.SELECTED_COLUMNS) != null) {
Review Comment:
This is a questionable decision in my mind.
We allow the user to set columns and schema, but behind the scenes we decide
to use only one of them?
If this would be a new code, I would ask for a validation, and we should
throw an exception if both of them are set.
OTOH it is a bit questionable what we should do now, as some users might
expect to get away with setting both, as they were ignored before.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]