kaxil opened a new pull request, #67791: URL: https://github.com/apache/airflow/pull/67791
common.ai's curated toolsets (`SQLToolset`, `HookToolset`, `MCPToolset`) are pydantic-ai `AbstractToolset`s and already work natively with `AgentOperator`. This adds the reverse direction: `airflow_toolset_to_langchain_tools(toolset)` converts any of them into LangChain `StructuredTool` objects, so a LangChain agent or chain running inside an Airflow task can call Airflow's connection-managed, validated tools. ## Why this lives in common.ai, not a separate langchain provider A toolset bridge is tool interop, not an agent runtime. common.ai already ships the `langchain` optional extra, a `LangChainHook` for model access, and LangChain example DAGs, so the dependency boundary is already here (langchain is imported lazily and gated by the extra). The forward direction (LangChain tools into `AgentOperator`) is already covered by pydantic-ai's upstream [`pydantic_ai.ext.langchain.LangChainToolset`](https://ai.pydantic.dev/toolsets/), so keeping only the reverse bridge in a separate provider would split the two halves of one feature. A dedicated provider for a single converter function is disproportionate overhead. This PR does not add a LangChain agent backend or a `LangChainOperator`. Frameworks that want to be the agent runtime (LangGraph Platform, LangChain Runnable operators) still belong in their own provider. ## Usage ```python from langchain.agents import create_agent from airflow.providers.common.ai.hooks.langchain import LangChainHook from airflow.providers.common.ai.toolsets import airflow_toolset_to_langchain_tools from airflow.providers.common.ai.toolsets.sql import SQLToolset tools = airflow_toolset_to_langchain_tools(SQLToolset(db_conn_id="sql_default")) model = LangChainHook(llm_conn_id="langchain_default", llm_model="openai:gpt-4o").get_chat_model() agent = create_agent(model, tools=tools, system_prompt="You are a SQL analyst.") ``` For the forward direction, no Airflow code is needed: put `LangChainToolset([my_tool])` into `AgentOperator(toolsets=[...])`. ## Notes and tradeoffs - Outside an agent run there is no live `RunContext`, so the bridge builds a minimal one with an inert placeholder model. The bundled toolsets ignore the context; a custom toolset that reads `ctx.model`, `ctx.messages`, or `ctx.usage` will not behave correctly when bridged standalone. This is documented on the function and in the toolsets guide. - `get_tools` is invoked eagerly at conversion time, so for `MCPToolset` a connection is opened then. - The toolset's own args validator runs before each call, so argument coercion (for example a string into an int) matches what the tool would get inside `AgentOperator`. - Requires the `langchain` extra (`apache-airflow-providers-common-ai[langchain]`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
