Guangyuan Feng created KYLIN-5571:
-------------------------------------

             Summary: It takes too much time to calculate the data size during 
pushing down queries, which will lead to the queries un-stoppable. 
                 Key: KYLIN-5571
                 URL: https://issues.apache.org/jira/browse/KYLIN-5571
             Project: Kylin
          Issue Type: Improvement
          Components: Query Engine
    Affects Versions: 5.0-alpha
            Reporter: Guangyuan Feng
            Assignee: Guangyuan Feng
             Fix For: 5.0-alpha


During pushing down the query, KE will try to calculate the included data size 
to set Spark partitions, but if there were too many files on HDFS, it will take 
a lot of time to complete.

So in order to improve this situation, the following things will be done:
 # Using a limited thread pool to calculate the data size
 # Add timeout for the calculation, so as to stop the query as soon as possible

After these changes, we can expected the query complete in a fixed duration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to