Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11158 )
Change subject: IMPALA-7540. Intern most repetitive strings and network addresses in catalog ...................................................................... Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java File fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@62 PS2, Line 62: public static void internFieldsInPlace( nit: for each method, should we list the set of fields that we selectively intern? (here and below internFieldsInPlace()? http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@88 PS2, Line 88: internFieldsInPlace(table.getClustering_columns()); null check for table http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@113 PS2, Line 113: if (sd.isSetCols()) { null check for sd. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@131 PS2, Line 131: if (fs.isSetName()) { null check http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@155 PS2, Line 155: "-1".equals(val) || : "0".equals(val) || : val.isEmpty() || : "true".equalsIgnoreCase(val) || : "false".equalsIgnoreCase(val) || : "TASK".equals(val) || : val.startsWith("impala_")) Can we create a static HashSet out of these (lowercased) and use something like if (hashSet.contains(val.lower()))) Easier to extend I think http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@172 PS2, Line 172: e.getKey().intern() Maybe move everything to guava interner to be consistent? Native interning is slow anyway (as per the blog post) http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@174 PS2, Line 174: return ret; Preconditions.checkState(ret.size() == parameters.size())? http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@188 PS2, Line 188: internColumnFamily can we rename it to something like internString() that we can reuse for all Strings? http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@198 PS2, Line 198: // but it's likely someone would trip over one of these. This looks risky if someone updates the thrift def for TNetworkAddress :-) http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java File fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java@35 PS2, Line 35: columnQualifier just curious, any reason why not this? http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@853 PS2, Line 853: hmsParameters_ = CatalogInterners.internParameters( : msPartition.getParameters()); Shouldn't we do it after we remove the incremental stats ? (or) blacklist incremental stats from getting interned. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1203 PS2, Line 1203: intern just curious, why not intern everything via Guava? -- To view, visit http://gerrit.cloudera.org:8080/11158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26 Gerrit-Change-Number: 11158 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Fri, 07 Sep 2018 03:11:29 +0000 Gerrit-HasComments: Yes