potiuk commented on PR #51059:
URL: https://github.com/apache/airflow/pull/51059#issuecomment-2979827579
> I feel pandas, numpy, datetime, are similar to this case, which are
common objects maybe used by multiple providers, or by tasks to pass in XComs.
This approach may help avoid implementing similar things in different providers.
I am not quite sure. I'd say it might also come to "common.dataframe"
provider (alongside polars and likely other things - see a discussion we had
some time ago (nothing materialized from it - but it could)
https://lists.apache.org/thread/qx3yh6h0l6jb0kh3fz9q95b3x5b4001l - then, any
provider that needs it could simply depend on it.
There is conceptually no difference of "common.sql" and "common.dataframe" -
and somehow we seem to want to have "dataframe" related code in Airflow core,
while we keep "sql" releated code in provider.
> However, i think pydantic is more of a core thing, it not essentially
belongs to a provider, it can be consumed and used in core without additional
dependencies. So there's no stopping anybody from returning a pydantic
dataclass object as xcom.
I think Pydantic is **two** different layers:
* serialization code that handles various Pydantic models that can be
registered with it - can indeed be in "airflow-core", especially that we
already use it for other things (namely fast api)
* but serialization/deserialization of concrete Pydantic Classes should be
"injected" by respective provider. If Cohere, defines their own "Pydantic"
model, and other provider wants to use that Pydantic model "deserialized" -
this is perfectly fine for the other provider to depend on Cohere to be able to
instantiate the concrete Cohere model. Basically you need ot have Cohere
provider installed, in order to use it's model. And installing Cohere provider
should "register" the Cohere model in the common Pydantic serializer.
This is "classic" dependency injection that we need to implement (and we
already do a lot of it common.sql for example) - but also `common.messaging`.
You implement "common" code in comon place, and then specific code that
implements "external entity variant of it" in a provider that registers the
"specific code" when it gets installed.
See https://github.com/apache/airflow/pull/50057 which "fixed" the way how
it is done for "common.messaging" -> provider's manager loads common.messaging
and the "common" code that then you can use to listen for incoming messages.
Common.messaging provides that "common implementation". Then "amazon" provider
providers SQS implementation of it and registers it (providers manager takes
care about discovering "queue classes" and injects them in common.messging - so
that common.messaging class can "know" about "sqs" class.
No core class needs to know or import "amazon" classes in this case. It does
not know about them at all. It's the amazon provider that "knows" about
common.messaging and exposes it's specific "amazon" messaging implementation
and allows provider's manager to register it when provider is installed (via
entrypoints).
This is IMHO how whole serialization of ours should be implemented. Airflow
core should not care about, pandas, polars, iceberg etc. etc. It shoudl be
"pure, no dependencies" code. Providers - when installed - should provide all
the code that is specific for the "external entity" (amazon, iceberg, whatever)
and make it usable by the core. But imporitng "iceberg" anywhere in
"airlfow-core" is just **wrong**.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]