stevenzwu commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r3301339136
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -1940,6 +1940,60 @@ paths:
5XX:
$ref: '#/components/responses/ServerErrorResponse'
+ /v1/{prefix}/events:
+ parameters:
+ - $ref: '#/components/parameters/prefix'
+ post:
+ tags:
+ - Catalog API
+ summary: Get events for changes to catalog objects
+ description: >
+ Returns a sequence of changes to catalog objects (tables, namespaces,
views)
Review Comment:
The endpoint description mixes three terms for the same concept — *events*,
*changes*, and *metadata modifications*:
- "Returns a sequence of changes to catalog objects"
- "track metadata modifications without polling"
- "Consumers track their progress … resumable synchronization"
- "fetch events" / "events `event-id`"
The schema and field names all use "events" (`QueryEventsRequest`, `events:
[Event]`, `event-id`, `request-event-count`), so settling on "events"
throughout the prose would match the wire format and remove the cognitive
shuffle for readers.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3708,6 +3792,71 @@ components:
allOf:
- $ref: '#/components/schemas/ScanTasks'
+ QueryEventsRequest:
+ type: object
+ properties:
+ continuation-token:
+ type: string
+ description: >
+ A continuation token to resume fetching events from a previous
request.
Review Comment:
Description here is asymmetric with the response side at L4439-4442. The
response side says "An opaque continuation token … Clients should treat this as
an opaque value and pass it unmodified in subsequent requests." The request
side just says "A continuation token to resume fetching events." If opacity is
the contract, both descriptions should state it — otherwise readers might infer
the request side accepts something different.
Suggested:
```
A continuation token returned by a previous response. Clients should pass the
opaque token value unmodified. If not provided, events are returned from the
beginning of the event log subject to other filters.
```
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3708,6 +3792,71 @@ components:
allOf:
- $ref: '#/components/schemas/ScanTasks'
+ QueryEventsRequest:
+ type: object
+ properties:
+ continuation-token:
+ type: string
+ description: >
+ A continuation token to resume fetching events from a previous
request.
+ If not provided, events are fetched from the beginning of the
event log
+ subject to other filters.
+ page-size:
+ type: integer
Review Comment:
The global `pageSize` parameter at L2089 has `minimum: 1`. This `page-size`
field doesn't, so a strictly-conforming client could send `0` or a negative
number and the schema would accept it. Add `minimum: 1` for consistency with
the rest of the spec's pagination story.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -4278,6 +4427,340 @@ components:
metadata:
$ref: '#/components/schemas/TableMetadata'
+ QueryEventsResponse:
+ type: object
+ required:
+ - continuation-token
+ - highest-processed-timestamp-ms
+ - events
+ properties:
+ continuation-token:
+ type: string
+ description: >
+ An opaque continuation token to fetch the next page of events.
+ This token encodes the server's cursor position and filter state.
+ Clients should treat this as an opaque value and pass it
unmodified in subsequent requests.
+ highest-processed-timestamp-ms:
+ description: >
+ The highest event timestamp processed when generating this
response.
+ This may not necessarily appear in the returned changes if it was
filtered out.
+ type: integer
+ format: int64
+ events:
+ type: array
+ items:
+ $ref: "#/components/schemas/Event"
+
+ Event:
+ type: object
+ required:
+ - event-id
+ - request-id
+ - request-event-count
+ - timestamp-ms
+ - operation
+ properties:
+ event-id:
+ type: string
+ description: Unique ID of this event. Clients should perform
deduplication based on this ID.
+ request-id:
+ description: >
+ Opaque ID of the request this change belongs to.
+ This ID can be used to identify events that were part of the same
request.
+ Servers generate this ID randomly.
+ type: string
+ request-event-count:
+ type: integer
+ description: >
+ Total number of events in this batch or request.
Review Comment:
Two issues with the existing description:
1. **Redundant.** First and last sentences restate the same fact:
> Total number of events in this batch or request.
> ...
> The `request-event-count` field indicates the total number of events
generated by that request.
2. **"this batch or request" mixes two terms** — the rest of the events spec
uses "request" alone. And the bare word "request" is itself ambiguous in this
endpoint, since there are two requests in play: the events-query request the
client just sent, and the originating catalog request (`updateTable`,
`commitTransaction`, etc.) that produced the event. The field is about the
latter, but the prose doesn't disambiguate.
Suggested:
```
Number of events produced by the originating catalog request (e.g.
updateTable
or commitTransaction) that generated this event. Such requests can apply
multiple updates atomically, each surfaced as a separate event sharing the
same `request-id`; this count reports how many events that originating
request produced in total.
```
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -4278,6 +4427,340 @@ components:
metadata:
$ref: '#/components/schemas/TableMetadata'
+ QueryEventsResponse:
+ type: object
+ required:
+ - continuation-token
+ - highest-processed-timestamp-ms
+ - events
+ properties:
+ continuation-token:
+ type: string
+ description: >
+ An opaque continuation token to fetch the next page of events.
+ This token encodes the server's cursor position and filter state.
+ Clients should treat this as an opaque value and pass it
unmodified in subsequent requests.
+ highest-processed-timestamp-ms:
+ description: >
+ The highest event timestamp processed when generating this
response.
+ This may not necessarily appear in the returned changes if it was
filtered out.
+ type: integer
+ format: int64
+ events:
+ type: array
+ items:
+ $ref: "#/components/schemas/Event"
+
+ Event:
+ type: object
+ required:
+ - event-id
+ - request-id
+ - request-event-count
+ - timestamp-ms
+ - operation
+ properties:
+ event-id:
+ type: string
+ description: Unique ID of this event. Clients should perform
deduplication based on this ID.
+ request-id:
+ description: >
+ Opaque ID of the request this change belongs to.
+ This ID can be used to identify events that were part of the same
request.
+ Servers generate this ID randomly.
+ type: string
+ request-event-count:
+ type: integer
+ description: >
+ Total number of events in this batch or request.
+ Some endpoints, such as "updateTable" and "commitTransaction", can
perform multiple updates in a single atomic request.
+ Each update is modeled as a separate event. All events generated
by the same request share the same `request-id`.
+ The `request-event-count` field indicates the total number of
events generated by that request.
+ timestamp-ms:
+ type: integer
+ format: int64
+ description: >
+ Timestamp when this event occurred (epoch milliseconds).
+ Timestamps are not guaranteed to be unique. Typically all events in
+ a transaction will have the same timestamp.
+ actor:
+ type: object
+ description: >
+ The actor who performed the operation, such as a user or service
account.
+ The content of this field is implementation specific.
+ additionalProperties: true
+ operation:
+ type: object
+ description: >
+ The operation that was performed, such as creating or updating a
table.
+ Clients should discard events with unknown operation types.
+ discriminator:
+ propertyName: operation-type
+ mapping:
+ create-table: "#/components/schemas/CreateTableOperation"
+ register-table: "#/components/schemas/RegisterTableOperation"
+ drop-table: "#/components/schemas/DropTableOperation"
+ update-table: "#/components/schemas/UpdateTableOperation"
+ rename-table: "#/components/schemas/RenameTableOperation"
+ create-view: "#/components/schemas/CreateViewOperation"
+ drop-view: "#/components/schemas/DropViewOperation"
+ update-view: "#/components/schemas/UpdateViewOperation"
+ rename-view: "#/components/schemas/RenameViewOperation"
+ create-namespace: "#/components/schemas/CreateNamespaceOperation"
+ update-namespace-properties:
"#/components/schemas/UpdateNamespacePropertiesOperation"
+ drop-namespace: "#/components/schemas/DropNamespaceOperation"
+ custom: "#/components/schemas/CustomOperation"
+ oneOf:
+ - $ref: "#/components/schemas/CreateTableOperation"
+ - $ref: "#/components/schemas/RegisterTableOperation"
+ - $ref: "#/components/schemas/DropTableOperation"
+ - $ref: "#/components/schemas/UpdateTableOperation"
+ - $ref: "#/components/schemas/RenameTableOperation"
+ - $ref: "#/components/schemas/CreateViewOperation"
+ - $ref: "#/components/schemas/DropViewOperation"
+ - $ref: "#/components/schemas/UpdateViewOperation"
+ - $ref: "#/components/schemas/RenameViewOperation"
+ - $ref: "#/components/schemas/CreateNamespaceOperation"
+ - $ref: "#/components/schemas/UpdateNamespacePropertiesOperation"
+ - $ref: "#/components/schemas/DropNamespaceOperation"
+ - $ref: "#/components/schemas/CustomOperation"
+
+ CreateTableOperation:
+ description: >
+ Operation to create a new table in the catalog.
+ Events for this Operation must be issued when the create is finalized
and committed, not when the create is staged.
Review Comment:
Of the 13 concrete operation schemas, only `CreateTableOperation` carries a
description, and the description carries semantic guidance that almost
certainly applies to its siblings:
> Events for this Operation must be issued when the create is finalized and
committed, not when the create is staged.
Drop, register, update, rename, and the namespace-level operations all have
parallel timing questions — emitted on stage? on commit? on retry? on
idempotent replay? Right now readers have to extrapolate from
`CreateTableOperation`. Two ways to fix:
- Add equivalent timing/emission guidance to each of the 12 other operations.
- Hoist the common rule to a single description on `Event.operation`
("Operations are emitted when the underlying change is committed; staged or
rolled-back changes are not surfaced."), and use per-operation descriptions
only for operation-specific notes.
The second option avoids 13 near-duplicate paragraphs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]