Re: [PR] Add BigQuery Storage Read API Enrichment Handler [beam]

via GitHub Fri, 20 Jun 2025 12:35:03 -0700


pandasanjay commented on PR #35197:
URL: https://github.com/apache/beam/pull/35197#issuecomment-2992614335


   > Have you actually seen poor performance from the existing handler in 
practice? Since we're usually not doing bulk reads with enrichment, its unclear 
to me that the storage read approach will actually be more performant
   
   @damccorm – For our use case of building an SCD Type 2 pipeline focused on 
slot consumption and cost optimization, we've observed that using the BigQuery 
Storage Read API offers significant advantages:
   
   - **Improved Performance:** The Storage Read API enables concurrent data 
reads through streaming, eliminating the need to compete for slots as required 
by the traditional query engine.
   - **Cost Optimization:** There are two main ways to save costs:
        - **No Need for Dedicated Slots:** Since the Storage Read API doesn't 
require slot reservations, we avoid the ongoing costs associated with dedicated 
slots.
        -  **Lower Data Scan Costs:** As of June 2024, the cost for scanning 
data via the Storage Read API is $1.10 per TiB, compared to $5.00 per TiB when 
using the BigQuery Query Engine.
        
   Only thing is if we can get the the partioning and clustering right, this 
will be cheap and efficent solution. 
   
   **References:**
   [BigQuery Storage Read API 
Pricing](https://cloud.google.com/bigquery/pricing#storage-api)
   [BigQuery On-Demand 
Pricing](https://cloud.google.com/bigquery/pricing#on_demand)
   
   Let me know if your thoughts :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Add BigQuery Storage Read API Enrichment Handler [beam]

Reply via email to