[GitHub] mccheah commented on a change in pull request #6: Support customizing the location where data is written in Spark

GitBox Tue, 27 Nov 2018 14:17:20 -0800

mccheah commented on a change in pull request #6: Support customizing the 
location where data is written in Spark
URL: https://github.com/apache/incubator-iceberg/pull/6#discussion_r236869168


 ##########
 File path: 
spark/src/main/java/com/netflix/iceberg/spark/source/IcebergSource.java
 ##########
 @@ -89,7 +92,11 @@ public DataSourceReader createReader(DataSourceOptions 
options) {
           .toUpperCase(Locale.ENGLISH));
     }
 
-    return Optional.of(new Writer(table, lazyConf(), format));
+    String dataLocation = options.get(TableProperties.WRITE_NEW_DATA_LOCATION)
+        .orElse(table.properties().getOrDefault(
+            TableProperties.WRITE_NEW_DATA_LOCATION,
+            new Path(new Path(table.location()), "data").toString()));
+    return Optional.of(new Writer(table, lazyConf(), format, dataLocation));
 
 Review comment:
   I think doing options processing from a `Map<String, String>`, inside a 
constructor, is a bit of an antipattern. Consider for example writing a unit 
test for this class in the future. If we pass the `Writer` constructor only a 
`HashMap`, the unit test would have to construct that `HashMap` in a specific 
way, i.e. knowing what key-value pairs the constructor is expecting.
   
   Perhaps we can have a builder object that acts as a factory that accepts the 
`Map` and returns the `Writer`. The `Writer` constructor accepts the builder 
object and copies the set fields on the builder into its own fields.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] mccheah commented on a change in pull request #6: Support customizing the location where data is written in Spark

Reply via email to