rdblue commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r978176248
##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -153,35 +139,22 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
Integer schemaId = JsonUtil.getIntOrNull(SCHEMA_ID, node);
- if (node.has(MANIFEST_LIST)) {
Review Comment:
Okay, I took a deeper look into our options here. I don't think that we
should break tables that have embedded list of manifests. I'm okay with no
longer writing the embedded list, but I don't want to break compatibility with
older tables.
That means we need this check.
Reading should work fine by lazily converting to `GenericManifestFile` when
a `FileIO` is provided through the `Snapshot` API. However, making changes to
the table metadata would break when writing out old snapshots. I was thinking
earlier that we could rely on `SnapshotProducer` to rewrite, but if the update
is a metadata change then we don't get a new snapshot that uses a list. We
could have an operation that rewrites the snapshots to use lists, but that
seems like a lot of work.
Instead, I think that the cleanest option is to pass `FileIO` to
`allManifests` for the write path where `v1ManifestLocations` is being called.
We can either do that by passing in a real `FileIO` or by creating a fake
`FileIO`.
I'd use a fake `FileIO` because we know that it is only used to create
`InputFile` instances for the list of manifests and then only `path` is called
on those input files to return the location. I think that's a clean way to
remove `FileIO` without adding a new method to the `Snapshot` interface. It's
also pretty small:
```java
private static class DummyFileIO implements FileIO {
@Override
public InputFile newInputFile(String path) {
return new InputFile() {
@Override
public long getLength() {
throw new UnsupportedOperationException();
}
@Override
public SeekableInputStream newStream() {
throw new UnsupportedOperationException();
}
@Override
public String location() {
return path;
}
@Override
public boolean exists() {
return true;
}
};
}
@Override
public OutputFile newOutputFile(String path) {
throw new UnsupportedOperationException();
}
@Override
public void deleteFile(String path) {
throw new UnsupportedOperationException();
}
}
```
I think I prefer our options in this order:
1. Use a fake FileIO to get the files
2. Keep passing a real FileIO in the write path only
3. Add `v1ManifestLocations` to the `Snapshot` interface
@nastra, what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]