+1 (non-binding) Really look forward to it! Howie

On Wed, Aug 10, 2022 at 11:52 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Sure I can explain - the main reasons are:
>
> - 1) binary representation performance - impact of this is rather limited
> because our API calls are doing rather heavy processing compared to the
> data being transmitted. But I believe it's not negligible.
> - 2) automated tools to automatically generate strongly typed Python code
> (that's the ProtoBuf part). The strongly typed Python code is what
> convinced me (see my POC). The tooling we got for that is excellent. Far
> more superior than dealing with json-encoded data even with schema.
> - 2) built-in "remote procedure" layer - where we have request/response
> semantics optimisations (for multiple calls over the same chanel) and
> exception handling done for us (This is basically what "Remote Procedure"
> interface provide us)
> - 3) built-in server that efficiently distributes the method called from
> multiple client into a multi-threaded/multi-threaded execution (all
> individual calls are stateless so multi-processing can be added on top
> regardless from the "transport" chosen).
>
> BTW. If you look at my POC code, there is nothing that strongly "binds" us
> to gRPC. The nice thing is that once it is implemented, it can be swapped
> out very easily. The only Proto/gRPC code that "leaks" to "generic" Airflow
> code is mapping of some (most complex) parameters to Proto . And this is
> only for most complex cases - literally only few of our types require
> custom serialisation - most of the mapping is handled automatically by
> generated protobuf code. And we can easily put the "complex" mapping in a
> separate package. Plus there is an "if" statement for each of the ~ 30 or
> so methods that we will have to turn into remotely-callable. We can even
> (as I proposed it as an option) add a little python magic and add a simple
> decorator to handle the "ifs". Then the decorator "if" can be swapped with
> some other "remote call" implementation.
>
> The actual bulk of the implementation is to make sure that all the places
> are covered (that's the testing harness).
>
>
>
> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
> <andrew.god...@astronomer.io.invalid> wrote:
>
>> Hi Jarek,
>>
>> Apologies - I was as not involved as I wanted to be in the AIP-44
>> process, and obviously my vote is non-binding anyway - but having done a
>> lot of Python API development over the past decade or so I wanted to know
>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>> schema, of course).
>>
>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree with
>> - but does not go into the option of using a standard Python HTTP server
>> with JSON schema enforcement, such as FastAPI. In my experience, the tools
>> for load balancing, debugging, testing and monitoring JSON/HTTP are
>> superior and more numerous than those for gRPC, and in addition the
>> asynchronous support for gRPC servers is lacking compared to their plain
>> HTTP counterparts, and the fact that you can interact and play with the
>> APIs in prototyping stages without having to handle obtaining correct
>> protobuf versions for the Airflow version you're using.
>>
>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>> address why gRPC would win over this option. Apologies again for the late
>> realisation that gRPC got chosen and was being voted on - it's been a very
>> busy summer.
>>
>> Thanks,
>> Andrew
>>
>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Just let me express my rather strong dissatisfaction with the way
>>> this "last minute" is raised.
>>>
>>> It is very late to come up with such a statement - not that it comes at
>>> all, but when it comes when everyone had a chance to take a look and
>>> comment, including taking a look at the POC and result of checks. This has
>>> never been raised even 4 months ago where the only choices were Thrift and
>>> gRPc).
>>>
>>> I REALLY hope the arguments are very strong and backed by real examples
>>> and data why it is a bad choice rather than opinions.
>>>
>>> J,.
>>>
>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <a...@apache.org>
>>> wrote:
>>>
>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>
>>>> I (or Andrew G) will follow up with more details shortly.
>>>>
>>>> -ash
>>>>
>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>> ja...@potiuk.com> wrote:
>>>>
>>>> Oh yeah :)
>>>>
>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pin...@umich.edu> wrote:
>>>>
>>>>> ah, good call.
>>>>>
>>>>> I guess the email template can be updated:
>>>>>
>>>>> Only votes from PMC members are binding, but members of the community
>>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>
>>>>>
>>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's votes
>>>>>> are binding. See
>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>> :D
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pin...@umich.edu> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ping
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>
>>>>>>>> The AIP-44 is here:
>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>
>>>>>>>> Discussion thread:
>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>
>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>
>>>>>>>> Please vote accordingly:
>>>>>>>>
>>>>>>>> [ ] + 1 approve
>>>>>>>> [ ] + 0 no opinion
>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>
>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>> "(non-binding)".
>>>>>>>>
>>>>>>>> ----
>>>>>>>>
>>>>>>>> Just a summary of where we are:
>>>>>>>>
>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the 
>>>>>>>> proposal I
>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>
>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>> already our dependency) which fits better the RPC "model" we need than 
>>>>>>>> our
>>>>>>>> public REST API.
>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>> maintainability features (very little impact on regular development 
>>>>>>>> effort)
>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>> an optional feature
>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>> "real" environment with Google Composer team support
>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one 
>>>>>>>> is
>>>>>>>> well in progress).
>>>>>>>>
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Reply via email to