iwannagotobed opened a new pull request, #67298:
URL: https://github.com/apache/airflow/pull/67298

   ## Summary
   
   Fixes tenant-aware ingestion for Weaviate operators.
   
   `WeaviateIngestOperator` and `WeaviateDocumentIngestOperator` already 
accepted a
   `tenant` argument, but the value was not consistently applied to the 
underlying
   hook operations.
   
   As a result, users could configure `tenant="..."` on the operator while some
   create, query, replace, or delete operations still ran against the base
   collection instead of the tenant-scoped collection.
   
   ## Why this matters
   
   Weaviate multi-tenancy isolates objects by tenant within a collection. For a
   multi-tenant collection, object operations must be performed through:
   
   ```python
   collection.with_tenant("<tenant>")
   ```
   
   If the Airflow operator accepts a tenant parameter but does not apply it to 
the
   actual Weaviate collection operation, the provider does not honor the user's
   multi-tenancy boundary.
   
   This can lead to confusing and risky behavior:
   
   - A Dag author sets `tenant` on the operator and expects data to be written 
into
     that tenant.
   - The ingest task appears successful, but tenant-scoped verification cannot 
find
     the object.
   - Document replacement logic may query or delete objects outside the intended
     tenant scope.
   
   ## Reproduction
   
   I reproduced the issue with a small Airflow UI Dag.
   
   The collection is multi-tenant, the ingest operator receives
   `tenant="tenant-a"`, and the verification task reads from the tenant-scoped 
collection.
   
   ```python
   COLLECTION_NAME = "AirflowTenantRepro"
   TENANT_NAME = "tenant-a"
   
   
   @task
   def create_collection():
       hook.create_collection(
           COLLECTION_NAME,
           vectorizer_config=None,
           multi_tenancy_config=Configure.multi_tenancy(
               enabled=True,
               auto_tenant_creation=True,
           ),
       )
   
   
   ingest_with_tenant = WeaviateIngestOperator(
       task_id="ingest_with_tenant",
       conn_id="weaviate_default",
       collection_name=COLLECTION_NAME,
       input_data=SAMPLE_DATA,
       tenant=TENANT_NAME,
   )
   
   
   @task
   def verify_tenant_data():
       collection = 
hook.get_collection(COLLECTION_NAME).with_tenant(TENANT_NAME)
       response = collection.query.fetch_objects(limit=10)
   
       if not response.objects:
           raise RuntimeError("Expected object was not found in tenant")
   ```
   
   Before this fix, the ingest task completed successfully, but the 
tenant-scoped
   verification task failed because the expected object was not found in
   `tenant-a`.
   
   <img width="793" height="391" alt="스크린샷 2026-05-22 오전 1 01 33" 
src="https://github.com/user-attachments/assets/acf49a28-1eef-4516-96b3-9e82a230927a";
 />
   
   ## Changes
   
   This change makes the configured tenant flow through the provider 
consistently:
   
   - Passes `tenant` from `WeaviateIngestOperator` to
     `WeaviateHook.batch_data()`.
   - Passes `tenant` from `WeaviateDocumentIngestOperator` to
     `WeaviateHook.create_or_replace_document_objects()`.
   - Applies `collection.with_tenant(tenant)` inside `WeaviateHook.batch_data()`
     before batch insertion.
   - Applies tenant scoping to document ingestion paths, including
     existing-document lookup, replace/delete, final batch insert, rollback 
delete,
     and verbose aggregate query.
   - Adds optional `tenant` support to `delete_object()` and 
`_delete_objects()` so
     cleanup and rollback operations stay within the same tenant scope.
   - Adds unit test coverage for operator-to-hook tenant handoff and hook-level
     tenant collection usage.
   
   ## Result
   
   After the fix, the same Airflow UI reproduction Dag succeeds end to end:
   
   - `create_collection`: success
   - `ingest_with_tenant`: success
   - `verify_tenant_data`: success
   - `cleanup_collection`: success
   
   <img width="774" height="268" alt="스크린샷 2026-05-22 오전 2 05 54" 
src="https://github.com/user-attachments/assets/1c1b15d8-51f8-4ac0-9dc5-9ad1e3400b94";
 />
   
   
   ## Tests
   
   I ran the relevant Weaviate provider tests with Breeze:
   
   ```bash
   breeze run pytest 
providers/weaviate/tests/unit/weaviate/operators/test_weaviate.py 
providers/weaviate/tests/unit/weaviate/hooks/test_weaviate.py -xvs
   ```
   
   Result:
   
   ```text
   54 passed
   ```
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes - Codex (GPT-5)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to