singhpk234 commented on code in PR #14867:
URL: https://github.com/apache/iceberg/pull/14867#discussion_r2678941794


##########
core/src/main/java/org/apache/iceberg/rest/RESTCatalogProperties.java:
##########
@@ -37,12 +37,107 @@ private RESTCatalogProperties() {}
 
   public static final String NAMESPACE_SEPARATOR = "namespace-separator";
 
-  // Enable planning on the REST server side
-  public static final String REST_SCAN_PLANNING_ENABLED = 
"rest-scan-planning-enabled";
-  public static final boolean REST_SCAN_PLANNING_ENABLED_DEFAULT = false;
+  // Configure scan planning mode
+  // Can be set by server in LoadTableResponse.config() or by client in 
catalog properties
+  // Negotiation rules: ONLY beats PREFERRED, both PREFERRED = client wins
+  // Default when neither client nor server provides: client-preferred
+  public static final String SCAN_PLANNING_MODE = "scan-planning-mode";
+  public static final String SCAN_PLANNING_MODE_DEFAULT =
+      ScanPlanningMode.CLIENT_PREFERRED.modeName();
 
   public enum SnapshotMode {
     ALL,
     REFS
   }
+
+  /**
+   * Enum to represent scan planning mode configuration.
+   *
+   * <p>Can be configured by:
+   *
+   * <ul>
+   *   <li>Server: Returned in LoadTableResponse.config() to advertise server 
preference/requirement
+   *   <li>Client: Set in catalog properties to set client 
preference/requirement
+   * </ul>
+   *
+   * <p>When both client and server configure this property, the values are 
negotiated:
+   *
+   * <p>Values:
+   *
+   * <ul>
+   *   <li>CLIENT_ONLY - MUST use client-side planning. Fails if paired with 
CATALOG_ONLY from other

Review Comment:
   > is it better to have clients just make intelligent choices when server 
side planning is available but not required, or is it better for servers to 
indicate preferences. My thought process is if a server really feels like it's 
advantageous to do remote planning, may as well just send it back as required
   
   This is mostly from the POV that its dependent on the load they are having 
at the moment when the call is made, for example lets take the following cases: 
   1. I am using py-iceberg, i know i am low on resources its better i just do 
remote planning if possible and the table is big and catalog can py-iceberg can 
say i prefer catalog to be planned and server based on catalog_only / 
catalog_preferred can have that negotiation.
   2. Let say i am spark and i have big compute infra, but i based on the 
current workload,
      -  lets say a lot of concurrent queries env, I will not have a lot of 
memory available to plan this, i would start with saying i prefer catalog 
      - let say i have dedicated cluster rather than doing remote plan i would 
do it in my JVM, i would say client_only from the client side
   
   Server Side 
    1. If the server is load and the client is open to plan it in client end 
then its better just server say hey i am burdened / low on resource are you 
open to planning in client end and hence as soft signal client_preferred, 
server has no clue on what the client is its purely sending this decision based 
on what its their state, sending client_only would have caused trouble for 
stuff like py-iceberg incase its configured to catalog_only
   
   please let me know what do you think of these cases ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to