[I] [Feature][Connector] Add BigQuery Sink Connector [seatunnel]

via GitHub Fri, 16 Jan 2026 20:59:53 -0800


davidzollo opened a new issue, #10355:
URL: https://github.com/apache/seatunnel/issues/10355


   ## Background
   
   BigQuery is Google Cloud's serverless, highly scalable, and cost-effective 
multi-cloud data warehouse. It is widely used by enterprises globally for data 
analytics, business intelligence, and machine learning workloads.
   
   Currently, SeaTunnel lacks native support for BigQuery as a sink, which 
limits its ability to integrate with the Google Cloud ecosystem efficiently.
   
   ## Motivation
   
   - **High Market Demand**: BigQuery is a core service in Google Cloud 
Platform (GCP) with a large enterprise customer base
   - **Cloud-Native Architecture**: While JDBC drivers exist for BigQuery, they 
provide poor performance and limited functionality compared to native SDK
   - **Advanced Features**: Native connector can support:
     - Streaming inserts for real-time data ingestion
     - Table partitioning and clustering
     - Nested and repeated fields (STRUCT, ARRAY)
     - Integration with Cloud Storage for efficient bulk loading
     - Schema auto-detection and evolution
   
   ## Proposed Solution
   
   Implement a dedicated BigQuery Sink connector using the Google Cloud Java 
SDK with the following capabilities:
   
   ### Core Features
   1. **Multiple Write Modes**
      - Batch loading via Cloud Storage (for high throughput)
      - Streaming inserts (for low latency)
      - Support for both append and overwrite modes
   
   2. **Schema Management**
      - Automatic schema creation and evolution
      - Support for complex data types (STRUCT, ARRAY, TIMESTAMP, GEOGRAPHY)
      - Schema validation and type mapping
   
   3. **Performance Optimization**
      - Configurable batch size and flush interval
      - Parallel writes with configurable parallelism
      - Retry mechanism with exponential backoff
   
   4. **Data Quality**
      - Row-level error handling
      - Dead letter queue for failed records
      - Data validation before insertion
   
   ### Configuration Example
   ```hocon
   sink {
     BigQuery {
       project = "my-gcp-project"
       dataset = "my_dataset"
       table = "my_table"
       
       # Authentication
       credentials_file = "/path/to/service-account.json"
       
       # Write configuration
       write_mode = "streaming" # or "batch"
       create_disposition = "CREATE_IF_NEEDED"
       write_disposition = "WRITE_APPEND" # or "WRITE_TRUNCATE"
       
       # Performance tuning
       max_batch_size = 1000
       max_batch_bytes = 10485760 # 10MB
       flush_interval_ms = 5000
       
       # Schema options
       auto_create_table = true
       schema_update_options = ["ALLOW_FIELD_ADDITION"]
     }
   }
   ```
   
   ## Expected Benefits
   
   1. **Better Performance**: Native SDK provides 10-100x better performance 
than JDBC for large-scale data ingestion
   2. **Cost Efficiency**: Optimized bulk loading via Cloud Storage reduces 
costs
   3. **Feature Completeness**: Access to BigQuery-specific features like 
streaming inserts and schema evolution
   4. **Enterprise Adoption**: Enables SeaTunnel to compete in GCP-based data 
integration scenarios
   
   ## Technical Considerations
   
   - **Dependencies**: Add `google-cloud-bigquery` SDK
   - **Authentication**: Support service account JSON, application default 
credentials, and workload identity
   - **Error Handling**: Implement robust retry logic and error reporting
   - **Testing**: Require integration tests with BigQuery emulator or test 
project
   
   ## References
   
   - [BigQuery Java Client 
Library](https://cloud.google.com/java/docs/reference/google-cloud-bigquery/latest/overview)
   - [BigQuery Storage Write 
API](https://cloud.google.com/bigquery/docs/write-api)
   - [Best Practices for Loading 
Data](https://cloud.google.com/bigquery/docs/best-practices-performance-input)
   
   ## Community Impact
   
   This connector will:
   - Expand SeaTunnel's cloud ecosystem support
   - Attract GCP users to the SeaTunnel community
   - Enable enterprises to build modern data pipelines on Google Cloud
   
   ---
   
   **Priority**: High  
   **Estimated Effort**: Medium  
   **Target Release**: 2.3.14 or 3.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature][Connector] Add BigQuery Sink Connector [seatunnel]

Reply via email to