rdblue commented on code in PR #9695:
URL: https://github.com/apache/iceberg/pull/9695#discussion_r1687128848
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3647,6 +3818,176 @@ components:
type: integer
description: "List of equality field IDs"
+ PreplanTableRequest:
+ type: object
+ required:
+ - table-scan-context
+ properties:
+ table-scan-context:
+ $ref: '#/components/schemas/TableScanContext'
+
+ PlanTableRequest:
+ type: object
+ required:
+ - table-scan-context
+ properties:
+ table-scan-context:
+ $ref: '#/components/schemas/TableScanContext'
+ plan-task:
+ $ref: '#/components/schemas/PlanTask'
+ stats-fields:
+ description:
+ A list of fields that the client requests the server to send
statistics
+ in each `FileScanTask` returned in the response
+ type: array
+ items:
+ $ref: '#/components/schemas/FieldName'
+
+ TableScanContext:
+ anyOf:
+ - $ref: '#/components/schemas/SnapshotScanContext'
+ - $ref: '#/components/schemas/IncrementalSnapshotScanContext'
+
+ BaseTableScanContext:
+ discriminator:
+ propertyName: type
+ mapping:
+ snapshot-scan: '#/components/schemas/SnapshotScanContext'
+ incremental-snapshot-scan:
'#/components/schemas/IncrementalSnapshotScanContext'
+ type: object
+ required:
+ - type
+ properties:
+ type:
+ type: string
+
+ SnapshotScanContext:
+ description: context for scanning data in a specific snapshot
+ type: object
+ allOf:
+ - $ref: '#/components/schemas/BaseTableScanContext'
+ required:
+ - type
Review Comment:
Right now, we don't require `snapshot-id`. The reason I suggested that
originally was to avoid requiring an extra call to the service to load the
table to get the snapshot. However, I think that not requiring it can lead to
some odd behavior when the client and server are out of sync with what the
latest snapshot is.
The client might thing (and could possibly log) that it is reading the
latest snapshot it knows about, `s1`, and the service may know about a newer
snapshot, `s2`, and plan with that instead. That could lead to a confusing
situation where a user may not know what version of a table was actually read.
That kind of problem is avoided by requiring a specific snapshot ID in the
request, but because snapshot IDs are unique it would require that the client
has loaded the table metadata. I think that is okay and I would rather not have
cases where the client and service can get out of sync. What do you think,
@rahil-c, @amogh-jahagirdar, @danielcweeks ?
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3647,6 +3818,176 @@ components:
type: integer
description: "List of equality field IDs"
+ PreplanTableRequest:
+ type: object
+ required:
+ - table-scan-context
+ properties:
+ table-scan-context:
+ $ref: '#/components/schemas/TableScanContext'
+
+ PlanTableRequest:
+ type: object
+ required:
+ - table-scan-context
+ properties:
+ table-scan-context:
+ $ref: '#/components/schemas/TableScanContext'
+ plan-task:
+ $ref: '#/components/schemas/PlanTask'
+ stats-fields:
+ description:
+ A list of fields that the client requests the server to send
statistics
+ in each `FileScanTask` returned in the response
+ type: array
+ items:
+ $ref: '#/components/schemas/FieldName'
+
+ TableScanContext:
+ anyOf:
+ - $ref: '#/components/schemas/SnapshotScanContext'
+ - $ref: '#/components/schemas/IncrementalSnapshotScanContext'
+
+ BaseTableScanContext:
+ discriminator:
+ propertyName: type
+ mapping:
+ snapshot-scan: '#/components/schemas/SnapshotScanContext'
+ incremental-snapshot-scan:
'#/components/schemas/IncrementalSnapshotScanContext'
+ type: object
+ required:
+ - type
+ properties:
+ type:
+ type: string
+
+ SnapshotScanContext:
+ description: context for scanning data in a specific snapshot
+ type: object
+ allOf:
+ - $ref: '#/components/schemas/BaseTableScanContext'
+ required:
+ - type
Review Comment:
Right now, we don't require `snapshot-id`. The reason I suggested that
originally was to avoid requiring an extra call to the service to load the
table to get the snapshot. However, I think that not requiring it can lead to
some odd behavior when the client and server are out of sync with what the
latest snapshot is.
The client might thing (and could possibly log) that it is reading the
latest snapshot it knows about, `s1`, and the service may know about a newer
snapshot, `s2`, and plan with that instead. That could lead to a confusing
situation where a user may not know what version of a table was actually read.
That kind of problem is avoided by requiring a specific snapshot ID in the
request, but because snapshot IDs are unique it would require that the client
has loaded the table metadata. I think that is okay and I would rather not have
cases where the client and service can get out of sync. What do you think,
@rahil-c, @amogh-jahagirdar, @danielcweeks?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]