rdblue commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1520451266
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -537,6 +537,108 @@ paths:
5XX:
$ref: '#/components/responses/ServerErrorResponse'
+ /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan:
+ parameters:
+ - $ref: '#/components/parameters/prefix'
+ - $ref: '#/components/parameters/namespace'
+ - $ref: '#/components/parameters/table'
+ post:
+ tags:
+ - Catalog API
+ summary: Find plan-tasks based on a plan context.
+ description:
+ Scan pre-planning creates a set of opaque planning tasks for a set of
scan configuration options.
+ Each task can be passed to the plan endpoint to fetch a (disjoint)
subset of the file scan tasks for the scan.
+
+ Scan pre-planning enables breaking scan planning across multiple
tasks.
+ This can be used to parallelize scan planning requests, use fewer
resources in each planning request,
+ or to delay parts of scan planning that may not be needed.
Review Comment:
I liked some of the information in the last version:
* Plan tasks are opaque
* The plan tasks are expected to produce a disjoint subset of the file scan
tasks (no overlap between plan tasks!)
* Plan tasks can reduce resources required for planning requests
* Plan tasks can be used to delay requests for more tasks, in case they are
not needed.
We don't necessarily need all of that, but I think there's still value
there. I'd also lean toward giving more context for requirements like that plan
tasks produce disjoint subsets.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -537,6 +537,104 @@ paths:
5XX:
$ref: '#/components/responses/ServerErrorResponse'
+ /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan:
+ parameters:
+ - $ref: '#/components/parameters/prefix'
+ - $ref: '#/components/parameters/namespace'
+ - $ref: '#/components/parameters/table'
+ post:
+ tags:
+ - Catalog API
+ summary: Prepare a list of plan tasks that can be used later for table
scan planning
+ description:
+ Prepare a list of plan tasks that can be used later for table scan
planning.
+ Each plan task in the response of this API can be used as the
`plan-task` in the `PlanTable` API request to perform scan planning against a
subset of the table files.
+ This can be used to parallelize and distribute table scan planning.
Review Comment:
Can you wrap these lines and make them separate paragraphs??
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]