Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39
PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS 
based partitioned "
            :     "tables. Due to HDFS architectural limitations, it is 
unlikely to get a linear "
            :     "speed up beyond 5 threads.
When multiple tables are loaded, should I think about the total number of 
threads as num_metadata_loading_threads * max_hdfs_parts_parallel_load? If so, 
is the scaling limitation of 5 with regards to total threads hitting the 
namenode or 5 * 16 (per default settings)?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783
PS5, Line 783: numPaths) throws Ca
> Correct. This is one of the overheads as noticed in the perf runs and unfor
CONF is the default configuration and its loaded once upfront for the lifetime 
of this class (L201).
I suspect few filesystems are specified-- perhaps we may get lucky and there is 
only one. Potentially, there's a way to make this method cheaper for such cases?


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801
PS5, Line 801:
> Yes, each partition can have its own no. of files, so the work definitely v
yes, that answers it. might be useful to try a workload that has the same 
number of blocks as your current workload, but distributed non-uniformly across 
partitions and files.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818
PS6, Line 818: for (Future task: pendingMdLoadTasks)
just for my own info-- since this work is triggered by an end-user, how is 
cancellation dealt with?



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Oct 2017 20:35:10 +0000
Gerrit-HasComments: Yes

Reply via email to