Sl1mb0 opened a new issue, #778:
URL: https://github.com/apache/iceberg-rust/issues/778
At the moment, the building and serialization of Iceberg metadata is coupled
together.
For example, let's say I want to build a `ManifestFile` that I then add to a
`ManifestList`:
(some code has not been included for the sake of brevity)
```rust
let manifest_file_path = NamedTempFile::new().unwrap();
let manifest_file_output = FileIOBuilder::new_fs_io()
.build()
.unwrap()
.new_output(manifest_file_path.path().to_str().unwrap())
.unwrap();
let manifest_writer = ManifestWriter::new(manifest_file_output, 0,
Vec::new());
let manifest_file = manifest_writer
.write(manifest)
.await
.unwrap()
let manifest_list_path = NamedTempFile::new().unwrap();
let manifest_list_output = FileIOBuilder::new_fs_io()
.build()
.unwrap()
.new_output(manifest_list_path.path().to_str().unwrap())
.unwrap();
let mut writer = ManifestListWriter::v2(manifest_list_output,0,0,0);
writer.add_manifests(vec![manifest_file]);
writer.close().await.unwrap();
```
- There is an abstract coupling of building and serialization: in order to
'build' a `ManifestFile` you have to 'write' a `Manifest`.
- There is another abstract coupling of building/serde: The _where this
metadata gets written to_ is included in the _what metadata is written_
- When you specify a location to write a `ManifestFile` to - that location
is where the `ManifestFile` gets written to _and is [included in the
metadata](https://github.com/apache/iceberg-rust/blob/42aff04658a00b390122260dbbeaf512d11af61f/crates/iceberg/src/spec/manifest.rs#L305)
of that `ManifestFile`_
- This means that when the built `ManifestFile` is added to a
`ManifestList`, the location of the `ManifestFile` is what's used to 'point'
the `ManifestList` to that `ManifestFile`
- This coupling forces the user to use the `FileIO`/`OutputFile`/`InputFile`
type to write to their preferred storage layer instead of allowing the user to
build/use their own abstractions for "where the bytes get written to"
- We would really like to separate the building and serialization layers
as that will allow us to use our own storage layer abstractions.
- To provide an example: if the user wants to use their own storage layer
for storing metadata bytes
- They must build/write all the necessary metadata types using `FileIO`
- They would then need to 'copy' all these bytes to their preferred
storage layer
- :warning: **problem** :warning:
- Because the metadata itself contains "where" the metadata is once
that metadata is "moved" somewhere else, it's no longer valid. This is because
the 'metadata hierarchy' (IE which metadata points to which snapshot points to
which manifest list etc) is only valid for where it was built/serialized. To
illustrate this:

In the above example the `ManifestList` and `ManifestFile` were
built/serialized on `Node B` and then copied over to `Node A` but because the
building/serialization was performed on `Node B` - the `ManifestList` on `Node
A` points to the `ManifestFile` on `Node B`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]