xinyiZzz commented on issue #35777: URL: https://github.com/apache/arrow/issues/35777#issuecomment-1719115860
> Great! I'd be interested in seeing how this develops. Thanks for your attention @lidavidm I'm trying to implement Arrow Flight SQL server in Apache Doris to support ADBC. I will release the design document to the Doris community soon. @ brief explanation: Motivation: Current demand is to speed up the return of Doris query results to Python. The scenario is data science/machine learning. In the future, it may be able to replace the interface of other systems such as Spark to read Doris. Doris is also a column-stored database. It is very expensive to convert the `column data` in Doris into `row data` and then back to `column data` through Mysql. Currently using python's mysql-client to read data from Doris is 10-20 times slower than ClickHouse. Previously I tried to use JDBC to take over the query in Doris Arrow Flight Server, compared with mysql-client, the performance improved by 4-10 times, but it was still 1 times slower than ClickHouse. Implementation: I referenced Arrow Flight Example and Dremio. Doris also has two roles: `Frontend` and `Backend`. `Frontend` is responsible for generate query plan and scheduling, and `Backend` is responsible for query execution. ADBC connection process: - ADBC Client connects to `Frontend`, `Frontend` sends the query plan to `Backend`, and returns the endpoint of the result `Backend` to the ADBC Client. - ADBC Client connects to `Backend` in the endpoint to pull data. After a simple test, the performance has been several times faster than ClickHouse. I will continue to develop it and look forward to the final effect. Related PRs: https://github.com/apache/doris/pull/23765 https://github.com/apache/doris/pull/24314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org