Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8235 )

Change subject: IMPALA-5429: Multi threaded block metadata loading
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc
File be/src/catalog/catalog.cc:

http://gerrit.cloudera.org:8080/#/c/8235/6/be/src/catalog/catalog.cc@39
PS6, Line 39: (Advanced) Number of threads used to load block metadata for HDFS 
based partitioned "
            :     "tables. Due to HDFS architectural limitations, it is 
unlikely to get a linear "
            :     "speed up beyond 5 threads.
> When multiple tables are loaded, should I think about the total number of t
Unfortunately not. With the current design, this queue is actually unbounded. 
num_metadata_loading_threads only applies to the loads happening via 
TableLoadingMgr class and there would be other loads, that happen via DDLs 
(CatalogOpExecutor) like REFRESHES/ADD PARTITIONS etc. The original plan was to 
use the TableLoadingMgr to queue all these loads but we didn't end up doing it 
since it shows up as a regression to the end users (since the DDLs can 
potentially wait in the queue much longer than before). Ideally we could do it 
and increase num_metadata_loading_threads to a large value to mimic the present 
behavior.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@783
PS5, Line 783: numPaths) throws Ca
> CONF is the default configuration and its loaded once upfront for the lifet
getFileSystem() by default uses a cache underneath unless we disable it via

CONF.setBoolean("fs.hdfs.impl.disable.cache", true);

But I'm not totally sure if there is a way to optimize beyond that point for 
each partition.


http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@801
PS5, Line 801:
> yes, that answers it. might be useful to try a workload that has the same n
That'd be a good experiment. We are definitely as fast as the slowest partition 
load. Given we are using a thread pool, smaller partitions give up the 
executing thread much quicker and that would be used by the queued 
partitions_to_load.


http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8235/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@818
PS6, Line 818: for (Future task: pendingMdLoadTasks)
> just for my own info-- since this work is triggered by an end-user, how is
Currently we don't support query cancellation for planning queries 
(IMPALA-915). That is a bigger query life-cycle change and needs to done 
separately.



--
To view, visit http://gerrit.cloudera.org:8080/8235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481
Gerrit-Change-Number: 8235
Gerrit-PatchSet: 6
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Oct 2017 22:28:54 +0000
Gerrit-HasComments: Yes

Reply via email to