rdblue commented on issue #845: Add persistent IDs to partition fields URL: https://github.com/apache/incubator-iceberg/pull/845#issuecomment-608612302 I'm only about half-way done reviewing this, but I wanted to capture some thoughts about forward-compatibility that was raised by @chenjunjiedada. If there are already multiple partition specs, then the IDs may be reused and can even conflict. This isn't something we can change because manifest files embed the field IDs in their schemas. That means assignment when there are no IDs must be from 1000 and should be independent across different partition specs. If an older version writes to the table, then it may remove any assigned partition IDs. That means that for any format v1 table, we must remain compatible with the current assignment strategy. That way, IDs can be removed by an old writer and will be the same when they are reassigned. This also means that evolution is limited in v1 tables. To ensure that IDs can be reassigned correctly if they are removed, partition fields cannot be dropped or reordered in any way. Otherwise, reassignment would be incorrect. That means no removing partition fields, no reordering partition fields, and no adding partition fields unless they are added at the end of the spec. We will be able to make more evolution changes when we can guarantee that all partition fields have IDs that won't be removed. We'll make the IDs a requirement in v2 tables.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
