[I] [API Update]: AlloyDBVectorWriterConfig Changes [beam]

via GitHub Tue, 10 Jun 2025 09:32:03 -0700


claudevdm opened a new issue, #35225:
URL: https://github.com/apache/beam/issues/35225


   ### What needs to happen?
   
   We've updated the `AlloyDBVectorWriterConfig` to make it more flexible and 
align it with our new `PostgresVectorWriter` transform. Here’s a quick guide to 
help you update your code.
   
   ### Here's a summary of the key changes:
   
   - Simplified Connection Config: `AlloyDBConnectionConfig` has been 
streamlined. You no longer need to wrap your connector options. Your username, 
password, database, and instance URI now go directly into 
`AlloyDBLanguageConnectorConfig`.
   -  New JDBC `WriteConfig`: Parameters like `autosharding` and 
`write_batch_size` have been moved out of the connection configuration and into 
a new `WriteConfig` from `jdbc_common`.
   - Moved Imports: The `ColumnSpecsBuilder` and `ConflictResolution` utilities 
have been moved from `alloydb` to a more general `postgres_common` module.
   
   ### Follow these steps to update your code
   #### Update your imports
   First, adjust your import statements. Some have been removed, and others now 
point to `postgres_common`.
   
   Old imports
   ```
   from apache_beam.ml.rag.ingestion.alloydb import AlloyDBConnectionConfig
   from apache_beam.ml.rag.ingestion.alloydb import 
AlloyDBLanguageConnectorConfig
   from apache_beam.ml.rag.ingestion.alloydb import AlloyDBVectorWriterConfig
   from apache_beam.ml.rag.ingestion.alloydb import ColumnSpec
   from apache_beam.ml.rag.ingestion.alloydb import ColumnSpecsBuilder
   from apache_beam.ml.rag.ingestion.alloydb import ConflictResolution
   ```
   
   New imports
   ```
   # New imports for JDBC and Postgres utilities
   from apache_beam.ml.rag.ingestion.jdbc_common import WriteConfig
   from apache_beam.ml.rag.ingestion.postgres_common import ColumnSpecsBuilder, 
ConflictResolution
   
   # Existing AlloyDB imports (no more AlloyDBConnectionConfig)
   from apache_beam.ml.rag.ingestion.alloydb import 
AlloyDBLanguageConnectorConfig, AlloyDBVectorWriterConfig
   ```
   
   #### Simplify Connection and optionally add WriteConfig
   Next, update how you configure your connection. You'll now pass credentials 
directly to AlloyDBLanguageConnectorConfig. Then, create a WriteConfig object 
for settings like autosharding
   
   Old configuration
   ```
   # Connector options were wrapped in AlloyDBConnectionConfig
   connector_options = AlloyDBLanguageConnectorConfig(
       database_name="<database_name>",
       instance_name="<instance_name>",
       autosharding=True,
       write_batch_size=1
   )
   
   connection_config = AlloyDBConnectionConfig.with_language_connector(
       connector_options=connector_options,
       username="<username>",
       password="<password>"
   )
   ```
   
   New Configuration
   ```
   # Simplified connection: credentials go directly here
   connection_config = AlloyDBLanguageConnectorConfig(
       username="<username>",
       password="<password>",
       database_name="<database_name>",
       instance_name="<instance_name>"
   )
   
   # New config for write-specific parameters
   jdbc_write_config = WriteConfig(
       autosharding=True,
       write_batch_size=1
   )
   ```
   
   #### Update the VectorDatabaseWriteTransform
   Finally, add the new write_config to your `AlloyDBVectorWriterConfig` 
instantiation within your pipeline.
   
   Old Transform
   ```
   | VectorDatabaseWriteTransform(
       AlloyDBVectorWriterConfig(
           connection_config=connection_config,
           table_name=self.default_table_name,
           column_specs=specs,
           conflict_resolution=conflict_resolution
       )
   )
   ```
   
   New Transform
   
   ```
   | VectorDatabaseWriteTransform(
       AlloyDBVectorWriterConfig(
           connection_config=connection_config,
           table_name=self.default_table_name,
           write_config=jdbc_write_config,  # <-- Add the new WriteConfig here
           column_specs=specs,
           conflict_resolution=conflict_resolution
       )
   )
   ```
   
   ### Issue Priority
   
   Priority: 2 (default / most normal work should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [API Update]: AlloyDBVectorWriterConfig Changes [beam]

Reply via email to