Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10982 )
Change subject: IMPALA-7219. 7.5% of Catalog Server heap wasted by empty HashMaps and ArrayLists ...................................................................... Patch Set 2: (5 comments) See suggestion about changing IncompleteTable to not inherit from Table -- maybe now with the 'FeTable' interface we could avoid that? http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java File fe/src/main/java/org/apache/impala/catalog/StructType.java: http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java@37 PS2, Line 37: public class StructType extends Type { agreed with Vuk -- not clear to me why we have so many empty structs. I thought StructType is only used for describing the schema of a table, and a table with no fields seems uncommon. Do you have handy the "paths to GC root" for these empty fields? http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java@39 PS2, Line 39: // Made non-final so that it can be initialized lazily to save memory when this map is not used. nit: this comment makes sense when looking at this as a patch, but when someone sees the code later, the "Made non-final" thing isn't super clear. Perhaps better to just describe the current treatment of the member, like: "Map of field name to field. Null is used to indicate empty to save memory." http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/StructType.java@84 PS2, Line 84: public StructField getField(String fieldName) { I wonder if we could do even better here and only lazily construct a field map at places where method is called. it seems this is the only usage site of the field map, and best I can tell, this method is only called during query analysis, and never by the catalogd itself. Given that, it doesn't really make sense to keep around these members, when it would be pretty trivial to reconstruct the map as necessary during path resolution. Then we could fully remove all these maps (even the non-empty ones). Would that save significant memory? Another thought here: it seems like the most common case where we have an empty struct woudl be a reference from IncompleteTable. I'm guessing you were looking at a catalogd dump from a cluster with a lot of tables? Maybe we could change IncompleteTable to not inherit from 'Table' so that it can avoid all of these members entirely? http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java@320 PS2, Line 320: { : return; : } > nit: single line same as below http://gerrit.cloudera.org:8080/#/c/10982/2/fe/src/main/java/org/apache/impala/catalog/Table.java@447 PS2, Line 447: public Column getColumn(String name) { I think it would actually be an error to call getColumn before the colsByName_ map was initted. Such a call would indicate that columns haven't been loaded. -- To view, visit http://gerrit.cloudera.org:8080/10982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9c75f65ecb3ba3f2c739fa483a84dc052f471c6 Gerrit-Change-Number: 10982 Gerrit-PatchSet: 2 Gerrit-Owner: Misha Dmitriev <count...@gmail.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Fri, 20 Jul 2018 18:26:02 +0000 Gerrit-HasComments: Yes