Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata ......................................................................
Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG Commit Message: Line 11: serialization library. > Any benchmark numbers? Just curious. Also I think it is good to add them he Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS1, Line 24: > Trailing white spaces at multiple places in this file. Done Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b) > Can this be moved outside the loop like JAVA_FE_ARGS Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: PS1, Line 20: > extraneous white spaces. Done PS1, Line 54: file_name_prefix_idx > Didn't understand what this prefix means till I read HdfsTable.fileNamePref Done Line 70: root_type FbFileDesc; > What is the significance of this? I'm reading about FlatBuffers for the fir Yeah, it is mostly used if you read from JSON. Removed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 83: storageIdGenerator.put(host, new Short(diskId)); > Drive-by comment: using 'new' here is pessimising memory consumption. If yo Good point thanks. Done http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: PS1, Line 318: REPLICA_HOST_IDX_MASK > Looks like it used for UNMASKing, rename may be? It's a bit mask, so I'd rather keep the name as such :) Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF; > Document these? Done PS1, Line 357: getHostIdx > how about just making hostIdx_ a short rather than typecasting everytime? Done PS1, Line 360: REPLICA_HOST_CACHE_MASK > Shouldn't this be short rather than int? May be it works fine this way, jus For convenience. Java short is signed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 315: Preconditions.checkNotNull(fd); > I don't think this FileDescriptor.create() can ever return null? If yes, wo Done Line 338: * Computes the common prefix between 'fileName' and 'prevFileName' and returns a > I like the idea but I'm wondering if this kind of optimization is a little That's not entirely true. Multiple filenames from a single write do have a common prefix. Line 420: prefixCompressFileName(currentFileName, prevFileName); > Also I think it may not be optimal to construct the prefixes this way by co In principal you're right, this is not the best way to compress strings. However, due to the way these file names are generated (random) this is not meant to be a generic file name compressor. Here we take advantage of the pattern that file names of a single write exhibit to avoid storing the common prefix multiple times. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 96: protected static EnumSet<TableType> SUPPORTED_TABLE_TYPES = EnumSet.of( > May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by Not sure how this ended up here. Removed. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes