pandasanjay commented on PR #35197: URL: https://github.com/apache/beam/pull/35197#issuecomment-2992614335
> Have you actually seen poor performance from the existing handler in practice? Since we're usually not doing bulk reads with enrichment, its unclear to me that the storage read approach will actually be more performant @damccorm – For our use case of building an SCD Type 2 pipeline focused on slot consumption and cost optimization, we've observed that using the BigQuery Storage Read API offers significant advantages: - **Improved Performance:** The Storage Read API enables concurrent data reads through streaming, eliminating the need to compete for slots as required by the traditional query engine. - **Cost Optimization:** There are two main ways to save costs: - **No Need for Dedicated Slots:** Since the Storage Read API doesn't require slot reservations, we avoid the ongoing costs associated with dedicated slots. - **Lower Data Scan Costs:** As of June 2024, the cost for scanning data via the Storage Read API is $1.10 per TiB, compared to $5.00 per TiB when using the BigQuery Query Engine. Only thing is if we can get the the partioning and clustering right, this will be cheap and efficent solution. **References:** [BigQuery Storage Read API Pricing](https://cloud.google.com/bigquery/pricing#storage-api) [BigQuery On-Demand Pricing](https://cloud.google.com/bigquery/pricing#on_demand) Let me know if your thoughts :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org