Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
......................................................................


Patch Set 5:

(23 comments)

when staffan did the origin analysis, he found problems with both the total 
amount of memory needed as well as the rate of object creation. it would be 
good to try to reduce the latter in your current code even more, by accessing 
the fb structures directly instead of wrapping them with objects.

http://gerrit.cloudera.org:8080/#/c/6406/5/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

Line 31:   LZ4
zlib?


Line 38:   offset: long (id: 0);
why not specify a default, given that most will be 0?


Line 51:   disk_ids: [ushort] (id: 3);
is_cached?


http://gerrit.cloudera.org:8080/#/c/6406/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
reference which one exactly


http://gerrit.cloudera.org:8080/#/c/6406/5/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

Line 217:   // File descriptor metadata serialized into a FlatBuffer.
mention where to find the fb definition


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46:     private ConcurrentHashMap<String, Short> storageUuidToDiskId =
add trailing _ to member vars


Line 78:           diskId = (short) (storageIdGenerator.get(host) + 1);
at least assert that you don't get an overflow here


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java:

Line 78:   public byte toFlatBuffer() {
should we generally settle on toFb for functions like these? brevity is a 
virtue...


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 115:       int[] blockOffsets = createBlockMetadata(fbb, blockLocations, 
hostIndex, stats);
i couldn't tell from the name that this creates an array of fbs. call it 
something like createFbFileBlocks or createFbFileBlockVector to indicate how 
this relates to the fb structures (and the fact that it outputs fbs)?


Line 126:     public static FileDescriptor 
createWithSynthesizedBlockMetadata(FileStatus fileStatus,
feel free to generally abbreviate metadata as md, to make this very long name a 
bit shorter.


PS5, Line 141: blockOffsets
would also be good to name this so it's immediately clear what it is (ie, embed 
the fb struct name).

you can also apply that to other variable names, ie, use a deterministic naming 
algorithm to make it immediately clear what the things are. i used that 
approach in my proof-of-concept test code, i think it made the code reasonably 
easy to follow.


Line 161:      * Synthesizes the block metadata of a file represented by 
'fileStatus' that resides
you don't say what that block metadata contains.


Line 182:         List<BlockReplica> replicas = Lists.newArrayList(
same comment about intermediate objects as below


Line 208:         List<BlockReplica> replicas = 
Lists.newArrayListWithExpectedSize(
hm, it would be nice not to have to construct these intermediate objects. pass 
all the params that you need for construction into fileblock.createfbfileblock 
and then populate the fbfileblock directly?


Line 224:     public THdfsCompression getFileCompression() {
does this need to return a thrift struct?


Line 231:     public FileBlock getFileBlock(int idx) {
why not just return the fbfileblock and save yourself the object construction?


Line 329:      * Loads the metadata of a file block from its block location 
metadata 'location' and
what do you mean by 'load'? 'construct an FbFileBlock with the provided fbb'?


Line 346:         replicaIdx = replica.isCached() ?
please describe this encoding in the .fb file.


Line 360:       FbFileBlock.addOffset(fbb, location.getOffset());
why not just pass in the offset/length directly and avoid constructing a 
BlockLocation?


Line 402:     public boolean hasCachedReplica() { return hasCachedReplica_; }
instead of adding member vars, why not make all functions be static and take an 
FbFileBlock parameter. that way, you save yourself more object construction.


Line 449:   public static class FileDescStats {
since this only keeps track of a single number, consider using a Long instead, 
that's less indirect.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 134:   // if the table is stored in the catalog server.
what does that mean? i thought all tables are stored in the catalog server.


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
is this part of your changes?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to