Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 )
Change subject: IMPALA-5429: Multi threaded block metadata loading ...................................................................... Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39 PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS based partitioned " : "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " : "speed up beyond 5 threads. > When multiple tables are loaded, should I think about the total number of t Unfortunately not. With the current design, this queue is actually unbounded. num_metadata_loading_threads only applies to the loads happening via TableLoadingMgr class and there would be other loads, that happen via DDLs (CatalogOpExecutor) like REFRESHES/ADD PARTITIONS etc. The original plan was to use the TableLoadingMgr to queue all these loads but we didn't end up doing it since it shows up as a regression to the end users (since the DDLs can potentially wait in the queue much longer than before). Ideally we could do it and increase num_metadata_loading_threads to a large value to mimic the present behavior. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783 PS5, Line 783: numPaths) throws Ca > CONF is the default configuration and its loaded once upfront for the lifet getFileSystem() by default uses a cache underneath unless we disable it via CONF.setBoolean("fs.hdfs.impl.disable.cache", true); But I'm not totally sure if there is a way to optimize beyond that point for each partition. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801 PS5, Line 801: > yes, that answers it. might be useful to try a workload that has the same n That'd be a good experiment. We are definitely as fast as the slowest partition load. Given we are using a thread pool, smaller partitions give up the executing thread much quicker and that would be used by the queued partitions_to_load. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818 PS6, Line 818: for (Future task: pendingMdLoadTasks) > just for my own info-- since this work is triggered by an end-user, how is Currently we don't support query cancellation for planning queries (IMPALA-915). That is a bigger query life-cycle change and needs to done separately. -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 16 Oct 2017 22:28:54 +0000 Gerrit-HasComments: Yes