Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 )
Change subject: IMPALA-5429: Multi threaded block metadata loading ...................................................................... Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39 PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS based partitioned " : "tables. Due to HDFS architectural limitations, it is unlikely to get a linear " : "speed up beyond 5 threads. When multiple tables are loaded, should I think about the total number of threads as num_metadata_loading_threads * max_hdfs_parts_parallel_load? If so, is the scaling limitation of 5 with regards to total threads hitting the namenode or 5 * 16 (per default settings)? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783 PS5, Line 783: numPaths) throws Ca > Correct. This is one of the overheads as noticed in the perf runs and unfor CONF is the default configuration and its loaded once upfront for the lifetime of this class (L201). I suspect few filesystems are specified-- perhaps we may get lucky and there is only one. Potentially, there's a way to make this method cheaper for such cases? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801 PS5, Line 801: > Yes, each partition can have its own no. of files, so the work definitely v yes, that answers it. might be useful to try a workload that has the same number of blocks as your current workload, but distributed non-uniformly across partitions and files. http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818 PS6, Line 818: for (Future task: pendingMdLoadTasks) just for my own info-- since this work is triggered by an end-user, how is cancellation dealt with? -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 6 Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 16 Oct 2017 20:35:10 +0000 Gerrit-HasComments: Yes