Alex Behm has posted comments on this change. Change subject: IMPALA-3905: Add single-threaded scan node. ......................................................................
Patch Set 6: (22 comments) http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/exec-node.cc File be/src/exec/exec-node.cc: Line 287: if (state->query_options().mt_num_cores > 1) { > i thought we wanted to have this be >= 1 (and change the default to 0) Wasn't sure if you wanted to have those changes permanently. Done. http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 154: Status HdfsScanNodeBase::Prepare(RuntimeState* state) { > this is insanely long. leave todo: break this up. Done Line 373: for (auto& filter: filter_ctxs_) RETURN_IF_ERROR(filter.expr->Open(state)); > don't use auto here Done Line 682: /// This function may be called from multiple threads, and we expect each > this also sounds like it's specific to the old scan node. That's correct, but the value_ctxs come from the shared descriptor table, so they could be called from a table sink in the same fragment (or somewhere else), so it's better to have a clone. The previous code was not obviously safe to me in that respect, so I don't think this is totally specific to scanner threads. I think this check should remain even after removing the old scan node. Line 763: lock_guard<SpinLock> l(file_type_counts_lock_); > this seems specific to having multiple scanner threads. I moved this lock into the subclass. http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 95: /// into a roe batch queue (via HdfsScanner::ProcessSplit()). Those specifics > RowBatch Done Line 125: virtual bool IsMultiThreaded() const = 0; > can we pick a different name for that? i'm worried about causing confusion Done Line 216: /// Called by scanners when a range is complete. Used to trigger done_ and > done_ has migrated into the old scanner, and you explicitly reference a row Reworded. Line 242: typedef boost::unordered_map<int32_t, std::pair<int, int64_t>> PerVolumnStats; > std:: We rely on a bunch of utility functions like FindOrInsert() that only work on std::unordered_map now. I'd prefer to separate those changes. Line 266: const std::vector<FilterContext> filter_ctxs() const { return filter_ctxs_; } > const& ? Done Line 304: boost::unordered_set<int64_t> partition_ids_; > std:: Done Line 405: SpinLock file_type_counts_lock_; > isn't this specific to having multiple scanner threads? Yes. Moved the lock into subclass. Line 447: /// This must be called on Close() to unregister counters. > mention synchronization requirement? Done http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-mt.cc File be/src/exec/hdfs-scan-node-mt.cc: Line 73: // TODO: This is probably not worth splitting the organisational cost of splitting > intentional duplication? I think we can actually move this call into Open(). The comment doesn't seem to apply here. Line 74: // initialisation across two places. Move to before the scanner threads start. > reference to scanner threads Done (see above comment). Line 84: RETURN_IF_ERROR(runtime_state_->io_mgr()->GetNextRange( > single line? if not, break before call/after macro? Done http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: Line 716 > where did this go? I had removed it because most of it seemed wrong/misleading. Reworded comment. Line 92: RETURN_IF_ERROR(IssueInitialScanRanges(state)); > much better indeed http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: Line 44: /// HDFS-serialised data. > mention that this is not used on the non-mt path. Done Line 61: /// recycled. > todo: remove once mt is fully functional Done http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: Line 116: /// including the final one in Close(). > this comment seems unrelated to the c'tor. Moved into class comment. http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: Line 299: /// Always returns false if the scan_node_ is not multi-threaded. > this also deserves a clean-up todo. Done -- To view, visit http://gerrit.cloudera.org:8080/4113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes