rdblue commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775490751
@MaxGekk, can you share the use case that you have for this? You mentioned
truncation-specific optimizations. I think working with concrete use cases is
usually a good idea. If these are theoretical only -- like a user that can drop
all data but not a subset -- then we should put this off. If there's a specific
case, then let's discuss it.
I agree that there _may_ be good reason to pass that the engine's intent was
to truncate. That's why we have `SupportsTruncate` for the write builder. And I
agree with you that we don't necessarily need to use an atomic operation that
could truncate and add data at the same time. Your point about not having
insert permissions is a good one to justify not using `SupportsTruncate`,
although the case of a user that can drop all data but not subsets doesn't
sound real. The point about truncation possibly being a metadata operation is
why we added `SupportsDelete` at the table level.
Those points may indicate that an interface to truncate a table as a
stand-alone operation is valid, although I still think that it is a bad idea to
add more interfaces to v2 without a reasonable expectation that they will
actually be used.
Another problem here is that this is operation is proposed at the catalog
level, which does not fit with how v2 works. I think that the reason for this
is emulating what the Hive does, but that's not usually a good choice.
In v2, catalogs load tables and tables are modified. That's why
`SupportsDelete` extends `Table` and not `TableCatalog`. This keeps concerns
separate, so we have a way to handle tables that don't exist and a separate way
to handle tables that don't support a certain operation. Mixing those two
together at the catalog level over-complicates the API, requiring a source to
throw one exception if the table doesn't exist and another if it doesn't
support truncation. (We also went through this discussion with the recently
added interfaces to add/drop partitions.)
Assuming that it is worth adding this interface, I would expect it to be a
mix-in for `Table`. And like `SupportsOverwrite` that implements
`SupportsTruncate`, I think this should also update `SupportsDelete` so that
tables don't need to implement both interfaces.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org