John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8200 )

Change subject: IMPALA-4623: [DOCS] Document file handle caching
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967
PS1, Line 967: although the encryption layer
             :         adds overhead that might lessen the benefit of the 
caching.
> I'm not familiar with this overhead. What is this referring to?
I had written in the notes from our conversation HDFS encryption adds 
overhead". >From when we were thinking about all the other complicating 
factors, like Sentry GRANT/REVOKE.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973
PS1, Line 973: 20 thousand
> Just curious: How do you decide to use "20 thousand" vs "20,000"?
For big numbers, I try to stick with either spelled-out forms or obvious powers 
of 2. (Like I would say 65536 with no comma.) There are so many other separator 
conventions internationally 
(https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html) I don't 
want to be too US-centric.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991
PS1, Line 991: evict any stale file handles from the cache
> The file handles won't actually be evicted directly. The new metadata will
Done


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995
PS1, Line 995: To evaluate the effectiveness of file handle caching for a 
particular workload, issue the
             :         <codeph>PROFILE</codeph> statement in 
<cmdname>impala-shell</cmdname> or examine query
             :         profiles in the Impala web UI. Look for the ratio of 
<codeph>CachedFileHandlesHitCount</codeph>
             :         (ideally, should be high) to 
<codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low).
             :         Before starting any evaluation, run some representative 
queries to <q>warm up</q> the cache,
             :         because the first time each data file is accessed is 
always recorded as a cache miss.
> I'm not sure this belongs here, but information about the cache across the
Let's be inclusive for this first iteration and then fine-tune later if needed. 
We tend to be skimpy with such information which is a weakness IMO.



--
To view, visit http://gerrit.cloudera.org:8080/8200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f
Gerrit-Change-Number: 8200
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <jruss...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: John Russell <jruss...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 20:48:03 +0000
Gerrit-HasComments: Yes

Reply via email to