Hello Bharath Vissapragada, Tianyi Wang, Vuk Ercegovac, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/11232 to review the following change. Change subject: IMPALA-7406. Avoid memory overhead of FlatBuffer object wrappers for FileDescriptor ...................................................................... IMPALA-7406. Avoid memory overhead of FlatBuffer object wrappers for FileDescriptor This patch saves approximately 100 bytes per file in the catalog and impalad heap by changing HdfsPartition to store raw byte[] arrays instead of FileDescriptor wrapper objects. The outward-facing APIs of HdfsPartition remain the same: the patch uses Guava's lazy list transformation utility to return a List which is backed by the underlying byte[] and wraps on the fly. No new tests are added since existing tests cover these code paths from an end-to-end basis. I tested the memory usage by running a catalogd with background loading enabled, waiting until loading finished, and then running 'jmap -histo:live' a few times on the catalogd until the result stabilized. I then used a quick python script[1] to diff the histograms and look for any changes with high magnitude. I found the following significant diffs: Object counts: Negative (good): java.nio.HeapByteBuffer: 12131 -> 35 (-12096) org.apache.impala.fb.FbFileDesc: 12090 -> 0 (-12090) org.apache.impala.catalog.HdfsPartition$FileDescriptor: 12090 -> 0 (-12090) java.util.ArrayList: 37825 -> 28028 (-9797) [Ljava.lang.Object;: 24393 -> 15035 (-9358) Positive (bad): com.google.common.collect.RegularImmutableList: 1694 -> 2019 (+325) com.google.common.collect.SingletonImmutableList: 8560 -> 19142 (+10582) Net: -55431 + 10934 = -44497 Heap size: Negative (good): java.nio.HeapByteBuffer: 582288 -> 1680 (-580608) [Ljava.lang.Object;: 3560960 -> 3026448 (-534512) org.apache.impala.fb.FbFileDesc: 290160 -> 0 (-290160) java.util.ArrayList: 907800 -> 672672 (-235128) org.apache.impala.catalog.HdfsPartition$FileDescriptor: 193440 -> 0 (-193440) Positive (bad): com.google.common.collect.RegularImmutableList: 54208 -> 64608 (+10400) com.google.common.collect.SingletonImmutableList: 205440 -> 459408 (+253968) Net: -1833848 + 264368 = -1569480 To summarize, the measurements confirm that this patch saved about 130 bytes and 3.7 objects per file descriptor. The savings are a bit more than estimated because switching from ArrayList to ImmutableList inside HdfsPartition seems to have also yielded some savings as it optimizes for the singleton and empty case. In terms of performance, it would be very unlikely that this patch causes a regression. The short-lived allocations of the wrapper objects will all come out of the TLAB, and even loading a partition with a million partitions will only allocate ~100MB of short-lived objects, which should only take a few milliseconds to collect (young-generation collections take time relative to the number of live objects, not dead objects). The other overheads associated with any operation on a million-partition table would surely dominate. [1] https://gist.github.com/cb9c52907bbc416b055ffdaa34aaaa77 Change-Id: I986dd5c05ac8904898d01a6ef3487bd7225e0500 --- M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java 1 file changed, 55 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/11232/1 -- To view, visit http://gerrit.cloudera.org:8080/11232 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I986dd5c05ac8904898d01a6ef3487bd7225e0500 Gerrit-Change-Number: 11232 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>