> > Tried to get others some opportunity to comment, but I see it's mostly me > <> Bolke.
It might help to start off this separate discussion thread with a simple / concise problem statement. (that might sound snarky but it ain't :) ) And I say *might help* cus you still might get crickets even if you clarify... But as for me, I didn't really follow the main thread and when I looked at this one, which was a continuation of a convo, the context wasn't super clear so I moved on. On Mon, Dec 11, 2023 at 11:23 AM Jarek Potiuk <[email protected]> wrote: > Tried to get others some opportunity to comment, but I see it's mostly me > <> Bolke. > > > > > > Yep. I rectified the implicitness by storing this as an explicit > decision > > > record > > > in the ADR proposal. I think it's been - similarly as lack of > > > > > <snip> > > > > Happy to help here - since there is a good "vibe" on the ADR kind of > approach, I am happy to "lead by example" and propose a PR with ADRs for > serialization (learning from the whole discussions and that's a good > opportunity for me to learn some of the nitty-gritty details ). > > Would that work for you and can I count on some reviews ? > > > > It was added to the DAG serializer at the time, I mentioned that that > > looked weird > > given the fact that it did not seem connected to DAG serialization (to > me) > > and serde > > should be considered. So no, I wasn't insisting on making it part of > > serialization imho, > > it defacto was already. > > > > Let's straighten it up after writing ADRs and rounding up the serialization > case. We are aiming to enable AIP-44 and Pydantic in 2.9 so that's a good > time to see and make some measurements there. > > > > > > > > > Going through either of the serializations of ours here is not needed > > > overhead at all - we can just purely rely on Rust and get ORM on one > side > > > and Pydantic object on the other without touching any of our > > serialization > > > code. > > > > > > > That's fine to me. > > > > Yeah. We'll sort it out. Luckily it's a very localized place and we can > simply open > draft PR showing how it will look like and see if it looks simpler/whether > we lose > any of the properties that > > > > > > > I don't think you want to ;-). > > > > > Happy also to take a look at helping to straighten up DAG serialization > possibly > once I get a bit more familiar with the context. > > > > We could. It's probably just a three pass serializer then. > > > > Interesting :). > > > > > > > Exactly not. By adding it to Airflow 2.8.0 it would have made every > > previous version > > of this provider suddenly work without any adjustments against Airflow > > 2.8.0. Remember > > that the original bug report said "cannot serialize value of type > > PyODBC.Row". > > > > I was more talking about compatibility of ODBC provider. If we want to > directly > return ODBC.Row, it will only be serializable in Airflow 2.8.0 (currently > ODBC > provider has >= Airflow 2.6.0. What I want to achieve is to implement a fix > where you can upgrade just provider and get serializable output even if > the user decides to stay with Airflow 2.8.0. This w > > > > You can argue both ways here architecturally. > > > > 1) Airflow 2.8 with a new feature that allows PyODBC.Row to be > serialized, > > affecting past and future providers at large and (TaskFlow) returned > > variables. > > > > This requires users to upgrade to Airflow 2.8.0. Many of our users delay > such upgrades for months or years. So we are talking about making it > possible for them to upgrade "easily" where selective upgrades to > providers is the easy way. This also works better for some > Airflow-as-a-service > where 2.8 might only be available months from now. > > Migration to a new Airflow version is way more involved and uptick of new > versions (especially .0) is slow. > > We are basically talking about having something available now - for Airflow > 2.6, 2.7, (and 2.8) now > vs. many users having it months or years from now. And releasing it now > fixes actual issue > people have and raised us. And implementing it in provider has an added > benefit that > IF we also implement a generic serializer as part of ODBC and have a way > for > serde to discover serializers by providers, then we could implement it in > the > provider in a way that either uses internal "serialization" or allows serde > to > discover it. > > I see really no benefit of having Airflow know anything about ODBC. Are we > going to > implement all 80 provider's specific code and put it to airlfow and all > dependencies (even > optional) as airflow dependencies, now that we made enormous effort in > separating these? > > Imagine ODBC code depends on odbc_library. > What happens if Airflow has optional dependency on odbc_library <3 and the > provider has dependency on odbc_library >=3 ? Imagine that the provider > returns > ODBCRow from version 3 that serializer won't be able to handle. > > How do you install the two together? How do you install these two together? > How do > you make them work together? > > It really makes very little sense to have airflow depend on the library and > code that > provider already depends on, because this will create a whole new class of > potential conflicts. > > I think the *only* possible way to make ODBC Row handled by serde is for > serde to discover serializers from installed providers and use them from > there. > > This solves all the "dependency conflict" cases nicely and is long term > sustainable and maintainable. > > > > This is exactly the intention with the serializers in > > airflow.serialization.serializers, > > however can be tough as not all expose the information in public APIs > (e.g. > > deltalake) > > that allow to reinstate the object. > > > > But yes: loose, and pragmatic public API / SemVer. > > > > Yep. > > > > > >
