John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 )
Change subject: IMPALA-4623: [DOCS] Document file handle caching ...................................................................... Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml File docs/topics/impala_scalability.xml: http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967 PS1, Line 967: although the encryption layer : adds overhead that might lessen the benefit of the caching. > I'm not familiar with this overhead. What is this referring to? I had written in the notes from our conversation HDFS encryption adds overhead". >From when we were thinking about all the other complicating factors, like Sentry GRANT/REVOKE. http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973 PS1, Line 973: 20 thousand > Just curious: How do you decide to use "20 thousand" vs "20,000"? For big numbers, I try to stick with either spelled-out forms or obvious powers of 2. (Like I would say 65536 with no comma.) There are so many other separator conventions internationally (https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html) I don't want to be too US-centric. http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991 PS1, Line 991: evict any stale file handles from the cache > The file handles won't actually be evicted directly. The new metadata will Done http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995 PS1, Line 995: To evaluate the effectiveness of file handle caching for a particular workload, issue the : <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine query : profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph> : (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low). : Before starting any evaluation, run some representative queries to <q>warm up</q> the cache, : because the first time each data file is accessed is always recorded as a cache miss. > I'm not sure this belongs here, but information about the cache across the Let's be inclusive for this first iteration and then fine-tune later if needed. We tend to be skimpy with such information which is a weakness IMO. -- To view, visit http://gerrit.cloudera.org:8080/8200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f Gerrit-Change-Number: 8200 Gerrit-PatchSet: 1 Gerrit-Owner: John Russell <jruss...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: John Russell <jruss...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Comment-Date: Thu, 05 Oct 2017 20:48:03 +0000 Gerrit-HasComments: Yes