Totally agree with you Dale, that there are situations for efficiency, performance and better control/visibility/manageability that we need to expose partition management.
So as described, I suggested two things - the ability to do it in the current V2 API form via options and appropriate implementation in datasource reader/writer. And for long term, suggested that partition management can be made part of metadata/catalog management - SPARK-24252 (DataSourceV2: Add catalog support)? On 9/17/18, 8:26 PM, "tigerquoll" <tigerqu...@outlook.com> wrote: Hi Jayesh, I get where you are coming from - partitions are just an implementation optimisation that we really shouldn’t be bothering the end user with. Unfortunately that view is like saying RPC is like a procedure call, and details of the network transport should be hidden from the end user. CORBA tried this approach for RPC and failed for the same reason that no major vendor of DBMS systems that support partitions try to hide them from the end user. They have a substantial real world effect that is impossible to hide from the user (in particular when writing/modifying the data source). Any attempt to “take care” of partitions automatically invariably guesses wrong and ends up frustrating the end user (as “substantial real world effect” turns to “show stopping performance penalty” if the user attempts to fight against a partitioning scheme she has no idea exists) So if we are not hiding them from the user, we need to allow users to manipulate them. Either by representing them generically in the API, allowing pass-through commands to manipulate them, or by some other means. Regards, Dale. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/