rdblue commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1645215603
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -2838,6 +2988,69 @@ components:
additionalProperties:
type: string
+ PreplanTableRequest:
+ type: object
+ required:
+ - select
+ - filter
+ properties:
+ select:
+ description: A list of the selected column names. If referencing a
column with a nested field, ensure full name is joined by `.` such as `a.b`
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ default: true
+ snapshot-id:
+ description: Defaults to the table's current snapshot; This option
is not allowed when `snapshot-range` is present in the request.
+ type: integer
+ format: int64
+ snapshot-range:
+ description: A JSON list containing exactly 2 snapshot IDs
representing the start (exclusive) and end (inclusive) snapshots. This option
is not allowed when `snapshot-id` is present in the request.
+ type: array
+ items:
+ type: integer
+ format: int64
+
+ PlanTableRequest:
+ type: object
+ required:
+ - select
+ properties:
+ select:
+ description: A list of the selected column names
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ default: true
+ stats-fields:
+ description: A list of field names for which stats should be included
+ type: array
+ items:
+ type: string
+ plan-task:
+ $ref: '#/components/schemas/PlanTask'
+ snapshot-id:
Review Comment:
@danielcweeks, this was my suggestion because I think it makes the API
simpler. Rather than having several different options and having a more
complicated spec, we can collapse all of those into the snapshot the client
actually resolves.
That also avoids cases where the client ends up reading a different snapshot
than it thinks is current.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3789,6 +4003,21 @@ components:
EmptyResponse:
$ref: '#/components/examples/ListNamespacesEmptyExample'
+ PayloadTooLargeWithPlanResponse:
+ description:
+ Payload is too large. Required to use /preplan before /plan
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/IcebergErrorResponse'
+ example: {
+ "error": {
+ "message": " Cannot plan scan: too many task results; use
/preplan to split planning across multiple requests",
+ "type": "PayloadTooLargeWithPlanException",
+ "code": 413
Review Comment:
I suggested either 413 or 422, but you're right. 422 is the better option.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -532,6 +532,100 @@ paths:
5XX:
$ref: '#/components/responses/ServerErrorResponse'
+ /v1/{prefix}/namespaces/{namespace}/tables/{table}/preplan:
+ parameters:
+ - $ref: '#/components/parameters/prefix'
+ - $ref: '#/components/parameters/namespace'
+ - $ref: '#/components/parameters/table'
+ post:
+ tags:
+ - Catalog API
+ summary: Find plan-tasks based on a plan context.
+ description:
+ When a user submits a query, this operation will find the relevant
plan-tasks
+ based on the user's selected columns, and filters. The plan-tasks can
be later used during PlanTable
+ to distribute this work for performance gain.
+ operationId: PreplanTable
+ requestBody:
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/PreplanTableRequest'
+ responses:
+ 200:
+ $ref: '#/components/responses/PreplanTableResponse'
+ 400:
+ $ref: '#/components/responses/BadRequestErrorResponse'
+ 401:
+ $ref: '#/components/responses/UnauthorizedResponse'
+ 403:
+ $ref: '#/components/responses/ForbiddenResponse'
+ 404:
+ description:
+ Not Found
+ - NoSuchTableException, the table does not exist
+ - NoSuchNamespaceException, the namespace does not exist
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/ErrorModel'
+ examples:
+ TableDoesNotExist:
+ $ref: '#/components/examples/NoSuchTableError'
+ NamespaceDoesNotExist:
+ $ref: '#/components/examples/NoSuchNamespaceError'
+ 419:
+ $ref: '#/components/responses/AuthenticationTimeoutResponse'
+ 503:
+ $ref: '#/components/responses/ServiceUnavailableResponse'
+ 5XX:
+ $ref: '#/components/responses/ServerErrorResponse'
+
+ /v1/{prefix}/namespaces/{namespace}/tables/{table}/plan/:
+ parameters:
+ - $ref: '#/components/parameters/prefix'
+ - $ref: '#/components/parameters/namespace'
+ - $ref: '#/components/parameters/table'
+ post:
+ tags:
+ - Catalog API
+ summary: Initiate plan on table to find relevant file scan tasks for a
specific query.
+ operationId: PlanTable
+ description:
+ When a user submits a query, we can do a plan to find the relevant
file scan tasks
+ based on the user's selected columns, and filters. For enhanced query
performance,
+ users can provide a plan-task which will distribute the planning work.
+ requestBody:
+ content:
+ application/json:
+ schema:
+ $ref: '#/components/schemas/PlanTableRequest'
+ responses:
Review Comment:
This is relevant for a
[comment](https://github.com/apache/iceberg/pull/9695#discussion_r1641334986)
from @danielcweeks below. Looks like 413 isn't the right response because it
refers to the size of the request.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -2838,6 +2978,63 @@ components:
additionalProperties:
type: string
+ PreplanTableRequest:
+ type: object
+ required:
+ - select
+ - filter
+ - case-sensitive
+ properties:
+ select:
+ description: A list of the selected column names
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ snapshot-id:
+ description: an int64 snapshot ID (if snapshot-range is not
present); optional and defaults to the table's current snapshot
+ type: integer
+ format: int64
+ snapshot-range:
+ description: a JSON list containing exactly 2 int64 snapshot IDs (if
snapshot-id is not present) representing the start (defaults to exclusive, see
`start-exclusive`) and end (only inclusive) snapshots
+ type: array
+ items:
+ type: integer
+ format: int64
+ start-exclusive:
+ description: Indicates whether the user wants the start of
`snapshot-range` to be inclusive or exclusive
+ type: boolean
+ default: true
+
+ PlanTableRequest:
+ type: object
+ required:
+ - select
+ - filter
+ - case-sensitive
+ properties:
+ select:
+ description: A list of the selected column names
Review Comment:
Look good. Thank you!
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -2838,6 +2988,69 @@ components:
additionalProperties:
type: string
+ PreplanTableRequest:
+ type: object
+ required:
+ - select
+ - filter
+ properties:
+ select:
+ description: A list of the selected column names. If referencing a
column with a nested field, ensure full name is joined by `.` such as `a.b`
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ default: true
+ snapshot-id:
+ description: Defaults to the table's current snapshot; This option
is not allowed when `snapshot-range` is present in the request.
+ type: integer
+ format: int64
+ snapshot-range:
+ description: A JSON list containing exactly 2 snapshot IDs
representing the start (exclusive) and end (inclusive) snapshots. This option
is not allowed when `snapshot-id` is present in the request.
+ type: array
+ items:
+ type: integer
+ format: int64
+
+ PlanTableRequest:
+ type: object
+ required:
+ - select
+ properties:
+ select:
+ description: A list of the selected column names
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ default: true
+ stats-fields:
+ description: A list of field names for which stats should be included
+ type: array
+ items:
+ type: string
+ plan-task:
+ $ref: '#/components/schemas/PlanTask'
+ snapshot-id:
+ description: defaults to the table's current snapshot. This option
is not allowed when `snapshot-range` or 'plan-task` is present in the request.
+ type: integer
+ format: int64
+
+ snapshot-range:
+ description: A JSON list containing exactly 2 snapshot IDs
representing the start (exclusive) and end (inclusive) snapshots. This option
is not allowed when `snapshot-id` or `plan-task` is present in the request.
+ type: array
Review Comment:
This is something we could do. I suggested the range array but I called out
that we may just want to have start and end:
https://github.com/apache/iceberg/pull/9695#discussion_r1494914340
If you prefer the other option then let's go with that.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -2838,6 +2988,69 @@ components:
additionalProperties:
type: string
+ PreplanTableRequest:
+ type: object
+ required:
+ - select
+ - filter
+ properties:
+ select:
+ description: A list of the selected column names. If referencing a
column with a nested field, ensure full name is joined by `.` such as `a.b`
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ default: true
+ snapshot-id:
+ description: Defaults to the table's current snapshot; This option
is not allowed when `snapshot-range` is present in the request.
+ type: integer
+ format: int64
+ snapshot-range:
+ description: A JSON list containing exactly 2 snapshot IDs
representing the start (exclusive) and end (inclusive) snapshots. This option
is not allowed when `snapshot-id` is present in the request.
+ type: array
+ items:
+ type: integer
+ format: int64
+
+ PlanTableRequest:
+ type: object
+ required:
+ - select
+ properties:
+ select:
+ description: A list of the selected column names
+ type: array
+ items:
+ type: string
+ filter:
+ $ref: '#/components/schemas/Expression'
+ case-sensitive:
+ description: Indicates whether column selection and filtering should
be case sensitive
+ type: boolean
+ default: true
+ stats-fields:
+ description: A list of field names for which stats should be included
+ type: array
+ items:
+ type: string
+ plan-task:
+ $ref: '#/components/schemas/PlanTask'
+ snapshot-id:
Review Comment:
Here's the context:
https://github.com/apache/iceberg/pull/9695#discussion_r1494914340
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]