wypoon commented on a change in pull request #1508:
URL: https://github.com/apache/iceberg/pull/1508#discussion_r501463178
##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -165,13 +165,23 @@
TableProperties.PARQUET_BATCH_SIZE,
TableProperties.PARQUET_BATCH_SIZE_DEFAULT));
}
+ private Schema getTableSchema() {
+ if (snapshotId != null) {
+ return table.schemaForSnapshot(snapshotId);
+ } else if (asOfTimestamp != null) {
+ return table.schemaForSnapshotAsOfTime(asOfTimestamp);
+ } else {
+ return table.schema();
+ }
+ }
+
private Schema lazySchema() {
if (schema == null) {
if (requestedSchema != null) {
// the projection should include all columns that will be returned,
including those only used in filters
- this.schema = SparkSchemaUtil.prune(table.schema(), requestedSchema,
filterExpression(), caseSensitive);
+ this.schema = SparkSchemaUtil.prune(getTableSchema(), requestedSchema,
filterExpression(), caseSensitive);
} else {
- this.schema = table.schema();
+ this.schema = getTableSchema();
Review comment:
This is the only point I don't understand. As far as I can tell,
`BaseTableScan` is constructed with the user's requested schema. If the user is
wanting to read from an old snapshot, then they should be supplying a requested
schema compatible with the snapshot's schema. I couldn't see anything I needed
to change in `BaseTableScan`. If I'm missing something here, can you please
point out to me the parts I need to change?
I added a new test case where a requested schema is used, and that revealed
a bug in the spark2 `IcebergSource#createReader`. In that method, there is a
call to `SparkSchemaUtil.convert(Schema, StructType)` whose only purpose seems
to be to check that the requested `StructType` does not contain fields in the
`Schema` (for then the call would cause an exception), and here I needed to use
the snapshot `Schema` rather than the table `Schema`. Other than that, the
projection returned by the scan appears to be correct.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]