Hey guys. Just pushed the integration tests for the Qdrant provider with the CI configurations as requested and some refactoring for the Qdrant hook to remove redundancy.
On Tue, 16 Jan 2024 at 19:58, Jarek Potiuk <ja...@potiuk.com> wrote: > > self hosted image and the cloud offering, comprehensive integration tests > with the > Docker image should suffice. > > I am fine with that. > > On Tue, Jan 16, 2024 at 3:00 PM Anush Shetty <anush.she...@qdrant.com> > wrote: > > > P.S. The updates to Qdrant and the clients are backwards compatible. > > Further reducing any maintenance overhead. > > > > On Tue, 16 Jan, 2024, 7:29 pm Anush Shetty, <anush.she...@qdrant.com> > > wrote: > > > > > We'd gladly add the integration tests along with the mock tests that > are > > > currently in place and since there's no difference in running a self > > hosted > > > image and the cloud offering, comprehensive integration tests with the > > > Docker image should suffice. > > > > > > As you said, would appreciate any thoughts from the community. > > > > > > On Tue, 16 Jan, 2024, 7:11 pm Jarek Potiuk, <ja...@potiuk.com> wrote: > > > > > >> BTW: Dashboard links here: > > >> > > >> > > > https://airflow.apache.org/ecosystem/#airflow-provider-system-test-dashboards > > >> > > >> On Tue, Jan 16, 2024 at 2:39 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > > >> > > >> > I'd love to hear from others in the community who already use Qdrant > > >> what > > >> > they think :) ? > > >> > > > >> > Few comments to Anush: > > >> > > > >> > I did a bit of review of the links and did some usual research. > > >> > > > >> > 1) Re: requirements it does not introduce any big issues. Urllib3 < > 2 > > >> is a > > >> > bit strange (but we are anyhow limited by botocore now, so not a big > > >> issue, > > >> > I hope it can be removed in the future. > > >> > > > >> > Requires-Dist: fastembed (==0.1.1) ; (python_version < "3.12") and > > >> (extra > > >> > == "fastembed") > > >> > Requires-Dist: grpcio (>=1.41.0) > > >> > Requires-Dist: grpcio-tools (>=1.41.0) > > >> > Requires-Dist: httpx[http2] (>=0.14.0) > > >> > Requires-Dist: numpy (<1.21) ; python_version < "3.8" > > >> > Requires-Dist: numpy (>=1.21) ; python_version >= "3.8" and > > >> python_version > > >> > < "3.12" > > >> > Requires-Dist: numpy (>=1.26) ; python_version >= "3.12" > > >> > Requires-Dist: portalocker (>=2.7.0,<3.0.0) > > >> > Requires-Dist: pydantic (>=1.10.8) > > >> > Requires-Dist: urllib3 (>=1.26.14,<2.0.0) > > >> > > > >> > 2) Open source version seems to be fully supported and alive. This > > >> looks > > >> > pretty cool after looking at the information provided. The code is > > small > > >> > and literally calling the library QdrantClient, so it does not seem > > like > > >> > something that might require a lot of maintenance, > > >> > > > >> > My concerns are with testability and future-proof maintenance. This > > is a > > >> > fast-pacing area. There will be breaking changes. Yes. There are > unit > > >> > tests and system tests there. But we have no time/possibility to run > > our > > >> > tests against real quadrant serve and especially against one run in > > the > > >> > cloud "by hand". > > >> > > > >> > So, two points: > > >> > > > >> > 1) Open-source version: Similar to Kafka provider - seems Qdrant > has a > > >> > nicely dockerized version that can be installed from officially > > released > > >> > images (https://qdrant.tech/documentation/quick-start/) - seems > like > > >> > perfect candidate to run integration tests with it on our CI. If > that > > is > > >> > there, this means that we can both - easily make sure it continues > to > > >> work, > > >> > but also - equally easily bump the version of Qudrant when new > > >> major/minor > > >> > release is out and have our tests run automatically in our CI. And > it > > >> will > > >> > nicely run in Breeze with `breeze --integration qdrant` when someone > > >> wants > > >> > to run the integration tests locally: See > > >> > > > >> > > > https://github.com/apache/airflow/tree/main/tests/integration/providers/apache/kafka > > >> > and > > >> > > > >> > > > https://github.com/apache/airflow/blob/main/scripts/ci/docker-compose/integration-kafka.yml > > >> > - I think that shoudl be condition of approving it > > >> > > > >> > 2) Cloud version: It would also help if you could (especially if you > > >> want > > >> > to run the system tests against your cloud) that you get similar > > >> dashboards > > >> > as we have for Amazon and other LLM providers (maintained by > > Astronomer) > > >> > which would show the status of system tests you run with main > version. > > >> > > > >> > Are you ok with extending the PR and adding integration tests and > > >> > committing to maintaining such a dashboard? > > >> > > > >> > If there are voices from the community "yeah it's useful" - and the > > >> points > > >> > 1) and 2) are addressed, I am quite positive about accepting the > > >> provider :) > > >> > > > >> > J > > >> > > > >> > > > >> > > > >> > On Tue, Jan 16, 2024 at 1:41 PM Anush Shetty < > anush.she...@qdrant.com > > > > > >> > wrote: > > >> > > > >> >> Hello, Airflow community, > > >> >> > > >> >> I am Anush - an Integrations engineer at Qdrant. This discussion > > >> proposes > > >> >> to include Qdrant as a supported provider for Airflow. > > >> >> Following up on > > >> https://lists.apache.org/list.html?dev@airflow.apache.org > > >> >> . > > >> >> > > >> >> Qdrant - https://github.com/qdrant/qdrant, is an open-source > vector > > >> >> search > > >> >> engine and database, governed by the Apache-2.0 license, allowing > > >> complete > > >> >> freedom for commercial usage and redistribution. > > >> >> > > >> >> Proposed provider PR: https://github.com/apache/airflow/pull/36805 > > >> >> > > >> >> Qdrant ranks amongst the most performant and most used vector > > databases > > >> >> available today. > > >> >> - https://qdrant.tech/benchmarks/ > > >> >> - https://ossinsight.io/collections/vector-search-engine/ > > >> >> > > >> >> We believe Qdrant would be a valuable addition for Airflow users to > > >> have > > >> >> as > > >> >> an option when building DAGs. > > >> >> > > >> >> Qdrant can be deployed by users on their own or via Qdrant's cloud > > >> >> offering. > > >> >> > > >> >> The proposed provider supports interfacing with Qdrant instances > > >> through > > >> >> both REST and GRPC interfaces without any restrictions on the mode > of > > >> >> deployment used. > > >> >> > > >> >> As part of our commitment, the Qdrant team is willing to undertake > > the > > >> >> responsibility of maintaining and updating the provider as per user > > >> >> requests or any identified needs. > > >> >> > > >> >> Anush > > >> >> > > >> > > > >> > > > > > >