BenWhitehead commented on a change in pull request #15005: URL: https://github.com/apache/beam/pull/15005#discussion_r668234834
########## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1.java ########## @@ -59,6 +89,80 @@ * * <h3>Operations</h3> * + * <h4>Read</h4> + * + * <p>The currently supported read operations and their execution behavior are as follows: + * + * <table> + * <tbody> + * <tr> + * <th>RPC</th> + * <th>Execution Behavior</th> + * </tr> + * <tr> + * <td>PartitionQuery</td> + * <td>Parallel Streaming</td> + * </tr> + * <tr> + * <td>RunQuery</td> + * <td>Sequential Streaming</td> + * </tr> + * <tr> + * <td>BatchGet</td> + * <td>Sequential Streaming</td> + * </tr> + * <tr> + * <td>ListCollectionIds</td> + * <td>Sequential Paginated</td> + * </tr> + * <tr> + * <td>ListDocuments</td> + * <td>Sequential Paginated</td> + * </tr> + * </tbody> + * </table> + * + * <p>PartitionQuery should be preferred over other options if at all possible, becuase it has the + * ability to parallelize execution of multiple queries for specific sub-ranges of the full results. + * + * <p>You should only ever use ListDocuments if the use of <a target="_blank" rel="noopener + * noreferrer" + * href="https://cloud.google.com/firestore/docs/reference/rpc/google.firestore.v1#google.firestore.v1.ListDocumentsRequest">{@code + * show_missing}</a> is needed to access a document. RunQuery and PartitionQuery will always be + * faster if the use of {@code show_missing} is not needed. + * + * <p><b>Example Usage</b> + * + * <pre>{@code + * PCollection<PartitionQueryRequest> partitionQueryRequests = ...; + * PCollection<RunQueryResponse> partitionQueryResponses = partitionQueryRequests + * .apply(FirestoreIO.v1().read().partitionQuery().build()); + * }</pre> + * + * <pre>{@code + * PCollection<RunQueryRequest> runQueryRequests = ...; Review comment: Each of the `PTransform`s off of `FirestoreIO.v1().read()` represent an individual RPC which Firestore supports for access of data. Each of them has at least one differentiating feature from other similar methods and is justified in being present. 1. BatchGet is currently the only way to get documents by their id. Some customers do external id management which is then coordinated across several systems. 2. ListCollections is currently the only way in which you can enumerate the collections of a document. 3. ListDocuments is currently the only way in which you can access documents which have sub collections but no properties themselves (via `show_missing`) 4. RunQuery is the primary and most performant way of fetching document by some criteria. 5. PartitionQuery works in conjunction with RunQuery, today only CollectionGroup queries are support for partitioning but more query types are intended to be supported in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
