+1 (non-binding) Really look forward to it! Howie On Wed, Aug 10, 2022 at 11:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> Sure I can explain - the main reasons are: > > - 1) binary representation performance - impact of this is rather limited > because our API calls are doing rather heavy processing compared to the > data being transmitted. But I believe it's not negligible. > - 2) automated tools to automatically generate strongly typed Python code > (that's the ProtoBuf part). The strongly typed Python code is what > convinced me (see my POC). The tooling we got for that is excellent. Far > more superior than dealing with json-encoded data even with schema. > - 2) built-in "remote procedure" layer - where we have request/response > semantics optimisations (for multiple calls over the same chanel) and > exception handling done for us (This is basically what "Remote Procedure" > interface provide us) > - 3) built-in server that efficiently distributes the method called from > multiple client into a multi-threaded/multi-threaded execution (all > individual calls are stateless so multi-processing can be added on top > regardless from the "transport" chosen). > > BTW. If you look at my POC code, there is nothing that strongly "binds" us > to gRPC. The nice thing is that once it is implemented, it can be swapped > out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow > code is mapping of some (most complex) parameters to Proto . And this is > only for most complex cases - literally only few of our types require > custom serialisation - most of the mapping is handled automatically by > generated protobuf code. And we can easily put the "complex" mapping in a > separate package. Plus there is an "if" statement for each of the ~ 30 or > so methods that we will have to turn into remotely-callable. We can even > (as I proposed it as an option) add a little python magic and add a simple > decorator to handle the "ifs". Then the decorator "if" can be swapped with > some other "remote call" implementation. > > The actual bulk of the implementation is to make sure that all the places > are covered (that's the testing harness). > > > > On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin > <andrew.god...@astronomer.io.invalid> wrote: > >> Hi Jarek, >> >> Apologies - I was as not involved as I wanted to be in the AIP-44 >> process, and obviously my vote is non-binding anyway - but having done a >> lot of Python API development over the past decade or so I wanted to know >> why the decision was made to go with gRPC over just plain HTTP+JSON (with a >> schema, of course). >> >> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with >> - but does not go into the option of using a standard Python HTTP server >> with JSON schema enforcement, such as FastAPI. In my experience, the tools >> for load balancing, debugging, testing and monitoring JSON/HTTP are >> superior and more numerous than those for gRPC, and in addition the >> asynchronous support for gRPC servers is lacking compared to their plain >> HTTP counterparts, and the fact that you can interact and play with the >> APIs in prototyping stages without having to handle obtaining correct >> protobuf versions for the Airflow version you're using. >> >> I wouldn't go so far as to suggest a veto, but I do want to see the AIP >> address why gRPC would win over this option. Apologies again for the late >> realisation that gRPC got chosen and was being voted on - it's been a very >> busy summer. >> >> Thanks, >> Andrew >> >> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> Just let me express my rather strong dissatisfaction with the way >>> this "last minute" is raised. >>> >>> It is very late to come up with such a statement - not that it comes at >>> all, but when it comes when everyone had a chance to take a look and >>> comment, including taking a look at the POC and result of checks. This has >>> never been raised even 4 months ago where the only choices were Thrift and >>> gRPc). >>> >>> I REALLY hope the arguments are very strong and backed by real examples >>> and data why it is a bad choice rather than opinions. >>> >>> J,. >>> >>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <a...@apache.org> >>> wrote: >>> >>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just >>>> JSON, so -1 to that specific choice. Everything else I'm happy with. >>>> >>>> I (or Andrew G) will follow up with more details shortly. >>>> >>>> -ash >>>> >>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk < >>>> ja...@potiuk.com> wrote: >>>> >>>> Oh yeah :) >>>> >>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pin...@umich.edu> wrote: >>>> >>>>> ah, good call. >>>>> >>>>> I guess the email template can be updated: >>>>> >>>>> Only votes from PMC members are binding, but members of the community >>>>>> are encouraged to check the AIP and vote with "(non-binding)". >>>>> >>>>> >>>>> >>>>> +1 (binding) >>>>> >>>>> Thanks, >>>>> >>>>> Ping >>>>> >>>>> >>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com> >>>>> wrote: >>>>> >>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes >>>>>> are binding. See >>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy >>>>>> :D >>>>>> >>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pin...@umich.edu> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Ping >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hey everyone, >>>>>>>> >>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API". >>>>>>>> >>>>>>>> The AIP-44 is here: >>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >>>>>>>> >>>>>>>> Discussion thread: >>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3 >>>>>>>> >>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00 >>>>>>>> CEST), and until at least 3 binding votes have been cast. >>>>>>>> >>>>>>>> Please vote accordingly: >>>>>>>> >>>>>>>> [ ] + 1 approve >>>>>>>> [ ] + 0 no opinion >>>>>>>> [ ] - 1 disapprove with the reason >>>>>>>> >>>>>>>> Only votes from PMC members are binding, but members of the >>>>>>>> community are encouraged to check the AIP and vote with >>>>>>>> "(non-binding)". >>>>>>>> >>>>>>>> ---- >>>>>>>> >>>>>>>> Just a summary of where we are: >>>>>>>> >>>>>>>> It's been long in the making, but I think it might be a great >>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the >>>>>>>> proposal I >>>>>>>> have is quite well thought out and discussed a lot in the past: >>>>>>>> >>>>>>>> * we have a working POC for implementation used for performance >>>>>>>> testing: https://github.com/apache/airflow/pull/25094 >>>>>>>> * it is based on on industry-standard open-source gRPC (which is >>>>>>>> already our dependency) which fits better the RPC "model" we need than >>>>>>>> our >>>>>>>> public REST API. >>>>>>>> * it has moderate performance impact and rather good >>>>>>>> maintainability features (very little impact on regular development >>>>>>>> effort) >>>>>>>> * it is fully backwards compatible - the new DB isolation will be >>>>>>>> an optional feature >>>>>>>> * has a solid plan for full test coverage in our CI >>>>>>>> * has a backing and plans for more extensive complete testing in >>>>>>>> "real" environment with Google Composer team support >>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to >>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one >>>>>>>> is >>>>>>>> well in progress). >>>>>>>> >>>>>>>> >>>>>>>> J. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>