rdblue commented on pull request #1046:
URL: https://github.com/apache/iceberg/pull/1046#issuecomment-635442419


   @teabot, I think the main problem with unions is the support in processing 
engines. It is unlikely that Spark, Presto, Impala, etc. will add support for 
unions. Since Iceberg is a format that we want to be suitable for a common 
at-rest store, it doesn't make sense to have a type that requires work-around 
in those engines but has only a small benefit (ensuring only one option is 
non-null).
   
   I think it is unlikely that processing engines will support unions because 
it isn't clear how users would interact with them in SQL. For example, how do I 
filter to just records with a particular option of the union? That might seem 
easy, but it exposes underlying problems with unions and schema evolution, like 
[identifying union 
fields](https://github.com/apache/iceberg/pull/1046#pullrequestreview-416446251).
 If we generate names based on position, what happens when that position 
changes? If we do it based on ID, then we're exposing internal IDs to users.
   
   Also, what if a file is written with a version of the schema that has a new 
union option that isn't in the table schema? Do we choose another incorrect 
branch (null or default) or do we throw an exception?
   
   I think it is the right choice to continue using the more standard and 
well-defined types rather than adding union, since it would make it much harder 
to integrate Iceberg into processing engines.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to