xinyiZzz commented on issue #35777:
URL: https://github.com/apache/arrow/issues/35777#issuecomment-1719115860

   > Great! I'd be interested in seeing how this develops.
   
   Thanks for your attention @lidavidm 
   
   I'm trying to implement Arrow Flight SQL server in Apache Doris to support 
ADBC.
   
   I will release the design document to the Doris community soon. @
   
   
   brief explanation:
   
   Motivation:
   Current demand is to speed up the return of Doris query results to Python. 
The scenario is data science/machine learning. In the future, it may be able to 
replace the interface of other systems such as Spark to read Doris.
   
   Doris is also a column-stored database. It is very expensive to convert the 
`column data` in Doris into `row data` and then back to `column data` through 
Mysql.
   
   Currently using python's mysql-client to read data from Doris is 10-20 times 
slower than ClickHouse. Previously I tried to use JDBC to take over the query 
in Doris Arrow Flight Server, compared with mysql-client, the performance 
improved by 4-10 times, but it was still 1 times slower than ClickHouse.
   
   Implementation:
   I referenced Arrow Flight Example and Dremio.
   Doris also has two roles: `Frontend` and `Backend`. `Frontend` is 
responsible for generate query plan and scheduling, and `Backend` is 
responsible for query execution.
   ADBC connection process:
   - ADBC Client connects to `Frontend`, `Frontend` sends the query plan to 
`Backend`, and returns the endpoint of the result `Backend` to the ADBC Client.
   - ADBC Client connects to `Backend` in the endpoint to pull data.
   
   After a simple test, the performance has been several times faster than 
ClickHouse. I will continue to develop it and look forward to the final effect.
   
   Related PRs:
   https://github.com/apache/doris/pull/23765
   https://github.com/apache/doris/pull/24314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to