rdblue commented on pull request #31475:
URL: https://github.com/apache/spark/pull/31475#issuecomment-775490751


   @MaxGekk, can you share the use case that you have for this? You mentioned 
truncation-specific optimizations. I think working with concrete use cases is 
usually a good idea. If these are theoretical only -- like a user that can drop 
all data but not a subset -- then we should put this off. If there's a specific 
case, then let's discuss it.
   
   I agree that there _may_ be good reason to pass that the engine's intent was 
to truncate. That's why we have `SupportsTruncate` for the write builder. And I 
agree with you that we don't necessarily need to use an atomic operation that 
could truncate and add data at the same time. Your point about not having 
insert permissions is a good one to justify not using `SupportsTruncate`, 
although the case of a user that can drop all data but not subsets doesn't 
sound real. The point about truncation possibly being a metadata operation is 
why we added `SupportsDelete` at the table level.
   
   Those points may indicate that an interface to truncate a table as a 
stand-alone operation is valid, although I still think that it is a bad idea to 
add more interfaces to v2 without a reasonable expectation that they will 
actually be used.
   
   Another problem here is that this is operation is proposed at the catalog 
level, which does not fit with how v2 works. I think that the reason for this 
is emulating what the Hive does, but that's not usually a good choice.
   
   In v2, catalogs load tables and tables are modified. That's why 
`SupportsDelete` extends `Table` and not `TableCatalog`. This keeps concerns 
separate, so we have a way to handle tables that don't exist and a separate way 
to handle tables that don't support a certain operation. Mixing those two 
together at the catalog level over-complicates the API, requiring a source to 
throw one exception if the table doesn't exist and another if it doesn't 
support truncation. (We also went through this discussion with the recently 
added interfaces to add/drop partitions.)
   
   Assuming that it is worth adding this interface, I would expect it to be a 
mix-in for `Table`. And like `SupportsOverwrite` that implements 
`SupportsTruncate`, I think this should also update `SupportsDelete` so that 
tables don't need to implement both interfaces.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to