Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 )
Change subject: IMPALA-8341: Data cache for remote reads ...................................................................... Patch Set 4: (31 comments) Mostly looks good to me. Please feel free to defer some of the comments to a subsequent change to make faster progress with this one. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/exec/hdfs-scan-node-base.cc@365 PS4, Line 365: "DataCacheHitCount", TUnit::UNIT); Do we have tests for these to make sure they show up in the profiles and work as expected? http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/data-cache-test.cc@330 PS1, Line 330: for (int i = 0; i < NUM_THREADS; ++i) { > Prefer to keep this test as stand-alone to make development easier. wfm http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc File be/src/runtime/io/data-cache-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@138 PS4, Line 138: const char* fname_ = "foobar"; Move to the other test constants (TEMP_BUFFER_SIZE etc)? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@226 PS4, Line 226: const int64_t cache_size = 4 * 1024 * 1024; nit: This could now be 4 * FLAGS_data_cache_file_max_size ? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@349 PS4, Line 349: This second part Can they be to separate tests? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache-test.cc@410 PS4, Line 410: beyone nit: typo http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h File be/src/runtime/io/data-cache.h: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@83 PS4, Line 83: concurrency of one thread That's controlled by --data_cache_write_concurrency, right? Mention here? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@87 PS4, Line 87: /// Future work: Do we plan to look into picking partitions on faster disks with higher probability? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@88 PS4, Line 88: /// - bound number of backing files per partition by consolidating the content of very Should we just delete old files if we have reached a configurable high number, 1000 or so, and take the resulting cache hits instead of compacting? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@145 PS4, Line 145: nit: formatting http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@197 PS4, Line 197: /// to be called in a single threaded environment after all IO threads exited. Should we also delete the files when we close them? There's a distinction in the implementation and it's not clear to me why it's there. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@203 PS4, Line 203: const kudu::Slice& key Should we pass const CacheKey& here and convert it in the implementation? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.h@224 PS4, Line 224: VerifyFilesSizes nit: VerifyFileSizes http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@72 PS4, Line 72: const char* DataCache::Partition::CACHE_FILE_PREFIX = "impala-cache-file-"; static const char*? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@75 PS4, Line 75: struct DataCache::CacheFile { Should this be a class, given it has a c'tor and d'tor? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@173 PS4, Line 173: make_unique<CacheFile>(path, fd); : cache_files_.emplace_back(std::move I think you can merge these two lines, which also reduces the risk that someone accidentally uses the now empty cache_file in a future change. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@187 PS4, Line 187: vector nit: missing include, but we might generally omit this one. http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@186 PS4, Line 186: // Delete all existing backing files left over from previous runs. : vector<string> entries; : RETURN_IF_ERROR(FileSystemUtil::Directory::GetEntryNames(path_, &entries, 0, : FileSystemUtil::Directory::EntryType::DIR_ENTRY_REG)); : for (const string& entry : entries) { : if (entry.find_first_of(CACHE_FILE_PREFIX) == 0) { : const string file_path = JoinPathSegments(path_, entry); : int res; : RETRY_ON_EINTR(res, unlink(file_path.c_str())); : LOG(INFO) << Substitute("$0 old cache file $1 $2", : res == 0 ? "Deleted" : "Failed to delete", file_path, : res == 0 ? "" : GetStrErrMsg()); : } : } This might be a good candidate for a separate method, but I don't feel strongly about it http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@235 PS4, Line 235: total_size Wrap in PrintBytes, or just put "bytes" into the message http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@316 PS4, Line 316: meta_cache_->Free(pending_handle); nit: can be on a single line http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@391 PS4, Line 391: while (cache_file->current_offset + charge_len > FLAGS_data_cache_file_max_size && What are the cases where this loop runs more than once? Would a single if-stmt be sufficient? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@402 PS4, Line 402: // Set up a scoped exit to always remove entry from the pending insertion set. Does this need to go to line 387 before the return statement in the file loop? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@459 PS4, Line 459: 0 Is a capacity of 1 really supported? Should we require something that technically can work (1 page) or even something reasonable (1M?)? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/data-cache.cc@482 PS4, Line 482: if (partitions_.empty()) return 0; Is this a valid thing to happen or should we DCHECK? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/disk-io-mgr.cc@61 PS4, Line 61: should "must have"? Such a configuration is not supported anyways, but if people try, the daemons might just wipe each others cache files on startup. http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/hdfs-file-reader.h File be/src/runtime/io/hdfs-file-reader.h: http://gerrit.cloudera.org:8080/#/c/12987/1/be/src/runtime/io/hdfs-file-reader.h@48 PS1, Line 48: /// Please note that this interface is only effective for local HDFS reads as it > This is a common path shared with local-file-reader.h too. Not sure if it m I agree. ReadCachedFile would be another alternative. It seems unconventional to have a method read data and not include "Read" somewhere in its name. I don't feel strongly though and am ok with keeping it unchanged http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/runtime/io/hdfs-file-reader.cc@153 PS4, Line 153: VLOG(2) << "Reopening file " << scan_range_->file_string() VLOG_FILE? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/util/filesystem-util-test.cc File be/src/util/filesystem-util-test.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/util/filesystem-util-test.cc@147 PS4, Line 147: directory file type entries? http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/util/filesystem-util.cc File be/src/util/filesystem-util.cc: http://gerrit.cloudera.org:8080/#/c/12987/4/be/src/util/filesystem-util.cc@339 PS4, Line 339: RETRY_ON_EINTR(r, fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, Kudu's RWFile encapsulates this. Should we use it instead? http://gerrit.cloudera.org:8080/#/c/12987/4/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/12987/4/bin/start-impala-cluster.py@118 PS4, Line 118: None "_size" implies that the default should be something like 0? http://gerrit.cloudera.org:8080/#/c/12987/4/common/thrift/metrics.json File common/thrift/metrics.json: http://gerrit.cloudera.org:8080/#/c/12987/4/common/thrift/metrics.json@433 PS4, Line 433: "description": "Total number of insertion skipped in remote data cache due to concurrency limit.", Do we have a test for the new metrics to make sure they work as expected? -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 23 Apr 2019 20:48:31 +0000 Gerrit-HasComments: Yes