pandasanjay commented on PR #35197:
URL: https://github.com/apache/beam/pull/35197#issuecomment-2992614335
> Have you actually seen poor performance from the existing handler in
practice? Since we're usually not doing bulk reads with enrichment, its unclear
to me that the storage read approach will actually be more performant
@damccorm – For our use case of building an SCD Type 2 pipeline focused on
slot consumption and cost optimization, we've observed that using the BigQuery
Storage Read API offers significant advantages:
- **Improved Performance:** The Storage Read API enables concurrent data
reads through streaming, eliminating the need to compete for slots as required
by the traditional query engine.
- **Cost Optimization:** There are two main ways to save costs:
- **No Need for Dedicated Slots:** Since the Storage Read API doesn't
require slot reservations, we avoid the ongoing costs associated with dedicated
slots.
- **Lower Data Scan Costs:** As of June 2024, the cost for scanning
data via the Storage Read API is $1.10 per TiB, compared to $5.00 per TiB when
using the BigQuery Query Engine.
Only thing is if we can get the the partioning and clustering right, this
will be cheap and efficent solution.
**References:**
[BigQuery Storage Read API
Pricing](https://cloud.google.com/bigquery/pricing#storage-api)
[BigQuery On-Demand
Pricing](https://cloud.google.com/bigquery/pricing#on_demand)
Let me know if your thoughts :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]