Scott, We added the feature to append an entire manifest well after all the operations were built. The problem with it was that the manifest needs to write the snapshot ID of the commit into the manifest. So if you want to append a manifest, you need to write nulls for the snapshot ID and then fill it in when reading that manifest. That was a change that wasn't fully supported until v2 (although there is a feature flag to enable it in v1).
If you want to do this, I would recommend that you start a commit, get its snapshot ID, and write each manifest using that snapshot ID. It's safer to avoid inheritance. You'd also need to update the OverwriteFiles operation to accept manifests for the added files. The only reason I can think of for not doing this is that you can't get more specific metadata in the commit summary. But you could always have a way to pass that back from your distributed manifest write as well. Also, be careful about manifest rewrites and validation. If you rewrite manifests in parallel, you'd need to ensure that either the rewritten manifest is still in the table at commit time, or you'd need to re-do the work. Ryan On Wed, Jun 1, 2022 at 4:12 PM Scott Cowell <[email protected]> wrote: > Hi all, > > A question regarding OverwriteFiles API in comparison to AppendFiles. I > noticed that AppendFiles allows for adding manifest files, but > OverwriteFiles does not. Is this intentional? > > The use case that we have is distributed manifest file writes as part of a > table update. We'd like to be able to add these manifest files, along with > removing some set of data files that have been overwritten by data files > contained in those manifests, as part of a single snapshot. My > understanding is that the current API supports either of the following: > > - Don't do distributed manifest file writes - add data files directly > via the OverwriteFiles operation > - Do distrubuted manifest file writes, use AppendFiles op to add the > manifest files, then use DeleteFiles to remove any data files. But we > found that each operation generates another snapshot even if used in the > same transaction which is undesirable. > > Thoughts on the best way to enable this scenario? Would extending > OverwriteFiles to support adding manifest files be a workable solution? > > Thanks > Scott > -- Ryan Blue Tabular
