RussellSpitzer commented on PR #15706: URL: https://github.com/apache/iceberg/pull/15706#issuecomment-4101879529
> I think that it would be better to use a configurable column name than a hardcoded "ICEZVALUE" in `SparkZOrderFileRewriteRunner`. > > We could add an option to https://iceberg.apache.org/docs/latest/spark-procedures/#options-for-sort-strategy-with-zorder-sort_order, for example, for configuring this column name. What do you think, @RussellSpitzer? > > We should still throw the error as proposed here if the column coincides with an existing column in the table. I really don't think we need customization here, is this really that likely to be conflicting? If we are really worried I would just randomly change the name if the column exists. My hesitation on all of this is that it doesn't seem like a very likely situation so while it is true it could happen is it worth adding code for the very rare chance it does? So in terms of solutions I would consider 1. Randomly assign the column name (gen_zorder_xx) where if for some reason that exists we randomize the xx to something else 2. Just fail if the chosen column name is already in use 3. Let users customize it The order here is 1. No user intervention needed, we hide the problem 2. We expose that it's a problem 3. We expose it's a problem and have to support a custom new parameter just for this special case For all of these we still have the question of, "is it really worth adding more code to the project to avoid this" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
