stevenzwu commented on code in PR #12584:
URL: https://github.com/apache/iceberg/pull/12584#discussion_r3301339136


##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -1940,6 +1940,60 @@ paths:
         5XX:
           $ref: '#/components/responses/ServerErrorResponse'
 
+  /v1/{prefix}/events:
+    parameters:
+      - $ref: '#/components/parameters/prefix'
+    post:
+      tags:
+        - Catalog API
+      summary: Get events for changes to catalog objects
+      description: >
+        Returns a sequence of changes to catalog objects (tables, namespaces, 
views)

Review Comment:
   The endpoint description mixes three terms for the same concept — *events*, 
*changes*, and *metadata modifications*:
   - "Returns a sequence of changes to catalog objects"
   - "track metadata modifications without polling"
   - "Consumers track their progress … resumable synchronization"
   - "fetch events" / "events `event-id`"
   
   The schema and field names all use "events" (`QueryEventsRequest`, `events: 
[Event]`, `event-id`, `request-event-count`), so settling on "events" 
throughout the prose would match the wire format and remove the cognitive 
shuffle for readers.



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3708,6 +3792,71 @@ components:
       allOf:
         - $ref: '#/components/schemas/ScanTasks'
 
+    QueryEventsRequest:
+      type: object
+      properties:
+        continuation-token:
+          type: string
+          description: >
+            A continuation token to resume fetching events from a previous 
request.

Review Comment:
   Description here is asymmetric with the response side at L4439-4442. The 
response side says "An opaque continuation token … Clients should treat this as 
an opaque value and pass it unmodified in subsequent requests." The request 
side just says "A continuation token to resume fetching events." If opacity is 
the contract, both descriptions should state it — otherwise readers might infer 
the request side accepts something different.
   
   Suggested:
   ```
   A continuation token returned by a previous response. Clients should pass the
   opaque token value unmodified. If not provided, events are returned from the
   beginning of the event log subject to other filters.
   ```



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3708,6 +3792,71 @@ components:
       allOf:
         - $ref: '#/components/schemas/ScanTasks'
 
+    QueryEventsRequest:
+      type: object
+      properties:
+        continuation-token:
+          type: string
+          description: >
+            A continuation token to resume fetching events from a previous 
request.
+            If not provided, events are fetched from the beginning of the 
event log
+            subject to other filters.
+        page-size:
+          type: integer

Review Comment:
   The global `pageSize` parameter at L2089 has `minimum: 1`. This `page-size` 
field doesn't, so a strictly-conforming client could send `0` or a negative 
number and the schema would accept it. Add `minimum: 1` for consistency with 
the rest of the spec's pagination story.



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -4278,6 +4427,340 @@ components:
         metadata:
           $ref: '#/components/schemas/TableMetadata'
 
+    QueryEventsResponse:
+      type: object
+      required:
+        - continuation-token
+        - highest-processed-timestamp-ms
+        - events
+      properties:
+        continuation-token:
+          type: string
+          description: >
+            An opaque continuation token to fetch the next page of events.
+            This token encodes the server's cursor position and filter state.
+            Clients should treat this as an opaque value and pass it 
unmodified in subsequent requests.
+        highest-processed-timestamp-ms:
+          description: >
+            The highest event timestamp processed when generating this 
response.
+            This may not necessarily appear in the returned changes if it was 
filtered out.
+          type: integer
+          format: int64
+        events:
+          type: array
+          items:
+            $ref: "#/components/schemas/Event"
+
+    Event:
+      type: object
+      required:
+        - event-id
+        - request-id
+        - request-event-count
+        - timestamp-ms
+        - operation
+      properties:
+        event-id:
+          type: string
+          description: Unique ID of this event. Clients should perform 
deduplication based on this ID.
+        request-id:
+          description: >
+            Opaque ID of the request this change belongs to.
+            This ID can be used to identify events that were part of the same 
request.
+            Servers generate this ID randomly.
+          type: string
+        request-event-count:
+          type: integer
+          description: >
+            Total number of events in this batch or request.

Review Comment:
   Two issues with the existing description:
   
   1. **Redundant.** First and last sentences restate the same fact:
      > Total number of events in this batch or request.
      > ...
      > The `request-event-count` field indicates the total number of events 
generated by that request.
   2. **"this batch or request" mixes two terms** — the rest of the events spec 
uses "request" alone. And the bare word "request" is itself ambiguous in this 
endpoint, since there are two requests in play: the events-query request the 
client just sent, and the originating catalog request (`updateTable`, 
`commitTransaction`, etc.) that produced the event. The field is about the 
latter, but the prose doesn't disambiguate.
   
   Suggested:
   ```
   Number of events produced by the originating catalog request (e.g. 
updateTable
   or commitTransaction) that generated this event. Such requests can apply
   multiple updates atomically, each surfaced as a separate event sharing the
   same `request-id`; this count reports how many events that originating
   request produced in total.
   ```
   



##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -4278,6 +4427,340 @@ components:
         metadata:
           $ref: '#/components/schemas/TableMetadata'
 
+    QueryEventsResponse:
+      type: object
+      required:
+        - continuation-token
+        - highest-processed-timestamp-ms
+        - events
+      properties:
+        continuation-token:
+          type: string
+          description: >
+            An opaque continuation token to fetch the next page of events.
+            This token encodes the server's cursor position and filter state.
+            Clients should treat this as an opaque value and pass it 
unmodified in subsequent requests.
+        highest-processed-timestamp-ms:
+          description: >
+            The highest event timestamp processed when generating this 
response.
+            This may not necessarily appear in the returned changes if it was 
filtered out.
+          type: integer
+          format: int64
+        events:
+          type: array
+          items:
+            $ref: "#/components/schemas/Event"
+
+    Event:
+      type: object
+      required:
+        - event-id
+        - request-id
+        - request-event-count
+        - timestamp-ms
+        - operation
+      properties:
+        event-id:
+          type: string
+          description: Unique ID of this event. Clients should perform 
deduplication based on this ID.
+        request-id:
+          description: >
+            Opaque ID of the request this change belongs to.
+            This ID can be used to identify events that were part of the same 
request.
+            Servers generate this ID randomly.
+          type: string
+        request-event-count:
+          type: integer
+          description: >
+            Total number of events in this batch or request.
+            Some endpoints, such as "updateTable" and "commitTransaction", can 
perform multiple updates in a single atomic request.
+            Each update is modeled as a separate event. All events generated 
by the same request share the same `request-id`.
+            The `request-event-count` field indicates the total number of 
events generated by that request.
+        timestamp-ms:
+          type: integer
+          format: int64
+          description: >
+            Timestamp when this event occurred (epoch milliseconds).
+            Timestamps are not guaranteed to be unique. Typically all events in
+            a transaction will have the same timestamp.
+        actor:
+          type: object
+          description: >
+            The actor who performed the operation, such as a user or service 
account.
+            The content of this field is implementation specific.
+          additionalProperties: true
+        operation:
+          type: object
+          description: >
+            The operation that was performed, such as creating or updating a 
table.
+            Clients should discard events with unknown operation types.
+          discriminator:
+            propertyName: operation-type
+            mapping:
+              create-table: "#/components/schemas/CreateTableOperation"
+              register-table: "#/components/schemas/RegisterTableOperation"
+              drop-table: "#/components/schemas/DropTableOperation"
+              update-table: "#/components/schemas/UpdateTableOperation"
+              rename-table: "#/components/schemas/RenameTableOperation"
+              create-view: "#/components/schemas/CreateViewOperation"
+              drop-view: "#/components/schemas/DropViewOperation"
+              update-view: "#/components/schemas/UpdateViewOperation"
+              rename-view: "#/components/schemas/RenameViewOperation"
+              create-namespace: "#/components/schemas/CreateNamespaceOperation"
+              update-namespace-properties: 
"#/components/schemas/UpdateNamespacePropertiesOperation"
+              drop-namespace: "#/components/schemas/DropNamespaceOperation"
+              custom: "#/components/schemas/CustomOperation"
+          oneOf:
+            - $ref: "#/components/schemas/CreateTableOperation"
+            - $ref: "#/components/schemas/RegisterTableOperation"
+            - $ref: "#/components/schemas/DropTableOperation"
+            - $ref: "#/components/schemas/UpdateTableOperation"
+            - $ref: "#/components/schemas/RenameTableOperation"
+            - $ref: "#/components/schemas/CreateViewOperation"
+            - $ref: "#/components/schemas/DropViewOperation"
+            - $ref: "#/components/schemas/UpdateViewOperation"
+            - $ref: "#/components/schemas/RenameViewOperation"
+            - $ref: "#/components/schemas/CreateNamespaceOperation"
+            - $ref: "#/components/schemas/UpdateNamespacePropertiesOperation"
+            - $ref: "#/components/schemas/DropNamespaceOperation"
+            - $ref: "#/components/schemas/CustomOperation"
+
+    CreateTableOperation:
+      description: >
+        Operation to create a new table in the catalog.
+        Events for this Operation must be issued when the create is finalized 
and committed, not when the create is staged.

Review Comment:
   Of the 13 concrete operation schemas, only `CreateTableOperation` carries a 
description, and the description carries semantic guidance that almost 
certainly applies to its siblings:
   > Events for this Operation must be issued when the create is finalized and 
committed, not when the create is staged.
   
   Drop, register, update, rename, and the namespace-level operations all have 
parallel timing questions — emitted on stage? on commit? on retry? on 
idempotent replay? Right now readers have to extrapolate from 
`CreateTableOperation`. Two ways to fix:
   
   - Add equivalent timing/emission guidance to each of the 12 other operations.
   - Hoist the common rule to a single description on `Event.operation` 
("Operations are emitted when the underlying change is committed; staged or 
rolled-back changes are not surfaced."), and use per-operation descriptions 
only for operation-specific notes.
   
   The second option avoids 13 near-duplicate paragraphs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to