[GitHub] [beam] BenWhitehead commented on a change in pull request #15005: [BEAM-8376] Google Cloud Firestore Connector - Add Firestore v1 Read Operations

GitBox Mon, 12 Jul 2021 13:31:07 -0700


BenWhitehead commented on a change in pull request #15005:
URL: https://github.com/apache/beam/pull/15005#discussion_r668234834




##########
File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreV1.java
##########
@@ -59,6 +89,80 @@
  *
  * <h3>Operations</h3>
  *
+ * <h4>Read</h4>
+ *
+ * <p>The currently supported read operations and their execution behavior are 
as follows:
+ *
+ * <table>
+ *   <tbody>
+ *     <tr>
+ *       <th>RPC</th>
+ *       <th>Execution Behavior</th>
+ *     </tr>
+ *     <tr>
+ *       <td>PartitionQuery</td>
+ *       <td>Parallel Streaming</td>
+ *     </tr>
+ *     <tr>
+ *       <td>RunQuery</td>
+ *       <td>Sequential Streaming</td>
+ *     </tr>
+ *     <tr>
+ *       <td>BatchGet</td>
+ *       <td>Sequential Streaming</td>
+ *     </tr>
+ *     <tr>
+ *       <td>ListCollectionIds</td>
+ *       <td>Sequential Paginated</td>
+ *     </tr>
+ *     <tr>
+ *       <td>ListDocuments</td>
+ *       <td>Sequential Paginated</td>
+ *     </tr>
+ *   </tbody>
+ * </table>
+ *
+ * <p>PartitionQuery should be preferred over other options if at all 
possible, becuase it has the
+ * ability to parallelize execution of multiple queries for specific 
sub-ranges of the full results.
+ *
+ * <p>You should only ever use ListDocuments if the use of <a target="_blank" 
rel="noopener
+ * noreferrer"
+ * 
href="https://cloud.google.com/firestore/docs/reference/rpc/google.firestore.v1#google.firestore.v1.ListDocumentsRequest";>{@code
+ * show_missing}</a> is needed to access a document. RunQuery and 
PartitionQuery will always be
+ * faster if the use of {@code show_missing} is not needed.
+ *
+ * <p><b>Example Usage</b>
+ *
+ * <pre>{@code
+ * PCollection<PartitionQueryRequest> partitionQueryRequests = ...;
+ * PCollection<RunQueryResponse> partitionQueryResponses = 
partitionQueryRequests
+ *     .apply(FirestoreIO.v1().read().partitionQuery().build());
+ * }</pre>
+ *
+ * <pre>{@code
+ * PCollection<RunQueryRequest> runQueryRequests = ...;

Review comment:
       Each of the `PTransform`s off of `FirestoreIO.v1().read()` represent an 
individual RPC which Firestore supports for access of data. Each of them has at 
least one differentiating feature from other similar methods and is justified 
in being present.
   
   1. BatchGet is currently the only way to get documents by their id. Some 
customers do external id management which is then coordinated across several 
systems.
   2. ListCollections is currently the only way in which you can enumerate the 
collections of a document.
   3. ListDocuments is currently the only way in which you can access documents 
which have sub collections but no properties themselves (via `show_missing`)
   4. RunQuery is the primary and most performant way of fetching document by 
some criteria.
   5. PartitionQuery works in conjunction with RunQuery, today only 
CollectionGroup queries are support for partitioning but more query types are 
intended to be supported in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] BenWhitehead commented on a change in pull request #15005: [BEAM-8376] Google Cloud Firestore Connector - Add Firestore v1 Read Operations

Reply via email to