Hi all,
I’m working on applying our orc-support patch into the latest code bases ( IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>). Since our patch is based on cdh-5.7.3-release which was released one year ago, there’re lots of work to merge it. One of the biggest changes from cdh-5.7.3-release I notice is the new scan node & scanner model introduced in IMPALA-3902 <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s inspired by the investigating task in IMPALA-2849 <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot find any performance report in this issue. Could you share some report about this multi-thread refactor? I’m wondering how much this can improve the performance, since the old single thread scan node & multi-thread scanners model has supplied concurrent IO for reading, and most of the queries in OLAP are IO bound. Thanks, Quanlong