mccheah commented on a change in pull request #6: Support customizing the
location where data is written in Spark
URL: https://github.com/apache/incubator-iceberg/pull/6#discussion_r236869168
##########
File path:
spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java
##########
@@ -89,7 +92,11 @@ public DataSourceReader createReader(DataSourceOptions
options) {
.toUpperCase(Locale.ENGLISH));
}
- return Optional.of(new Writer(table, lazyConf(), format));
+ String dataLocation = options.get(TableProperties.WRITE_NEW_DATA_LOCATION)
+ .orElse(table.properties().getOrDefault(
+ TableProperties.WRITE_NEW_DATA_LOCATION,
+ new Path(new Path(table.location()), "data").toString()));
+ return Optional.of(new Writer(table, lazyConf(), format, dataLocation));
Review comment:
I think doing options processing from a `Map<String, String>`, inside a
constructor, is a bit of an antipattern. Consider for example writing a unit
test for this class in the future. If we pass the `Writer` constructor only a
`HashMap`, the unit test would have to construct that `HashMap` in a specific
way, i.e. knowing what key-value pairs the constructor is expecting.
Perhaps we can have a builder object that acts as a factory that accepts the
`Map` and returns the `Writer`. The `Writer` constructor accepts the builder
object and copies the set fields on the builder into its own fields.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services