Alex Behm has posted comments on this change.

Change subject: IMPALA-4768: Improve logging of table loading.
......................................................................


Patch Set 3:

(2 comments)

I'm not set on implementing the logging this way, but shipping the current code 
as is will be a supportability nightmare. Bharath and I have felt the impact of 
the lack of log messages while debugging a problem in an internal cluster.

I'm happy to implement these changes in a different way if you have a 
suggestion.

http://gerrit.cloudera.org:8080/#/c/5709/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS3, Line 241: BlockMetadataLoadStats
> I don't see any block-metadata-specific stats, so I am wondering if the nam
The file counts are specific to the block metadata loading. For full loads of 
partitioned tables, the common case is that we recursively iterate over *all* 
files in the table's root directory and then map the files to partitions based 
on the their containing directory. This means that there could be files in the 
table's root dir that don't belong to any partition. So the number of files 
scanned and the number of files actually belonging to the table could be 
different, and a large discrepancy could explain loading perf regressions (if 
any).

I'm ok with removing this change if you prefer.


PS3, Line 244: // Number of files that were found to belong to the table being 
loaded.
             :     public long tableFileCount = 0;
> Not sure I understand what this field represents? Can you please explain ?
See above comment, hope it explains it.

Let me know if you want to keep or remove this. If you prefer to keep, I'll 
expand the comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/5709
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8de96d0cb6d09b2272b1925d42cb059367fe7196
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to