Alex Behm has posted comments on this change.

Change subject: IMPALA-3905: Add single-threaded scan node.
......................................................................


Patch Set 6:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

Line 287:       if (state->query_options().mt_num_cores > 1) {
> i thought we wanted to have this be >= 1 (and change the default to 0)
Wasn't sure if you wanted to have those changes permanently. Done.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 154: Status HdfsScanNodeBase::Prepare(RuntimeState* state) {
> this is insanely long. leave todo: break this up.
Done


Line 373:   for (auto& filter: filter_ctxs_) 
RETURN_IF_ERROR(filter.expr->Open(state));
> don't use auto here
Done


Line 682:     /// This function may be called from multiple threads, and we 
expect each
> this also sounds like it's specific to the old scan node.
That's correct, but the value_ctxs come from the shared descriptor table, so 
they could be called from a table sink in the same fragment (or somewhere 
else), so it's better to have a clone. The previous code was not obviously safe 
to me in that respect, so I don't think this is totally specific to scanner 
threads.

I think this check should remain even after removing the old scan node.


Line 763:     lock_guard<SpinLock> l(file_type_counts_lock_);
> this seems specific to having multiple scanner threads.
I moved this lock into the subclass.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 95: /// into a roe batch queue (via HdfsScanner::ProcessSplit()). Those 
specifics
> RowBatch
Done


Line 125:   virtual bool IsMultiThreaded() const = 0;
> can we pick a different name for that? i'm worried about causing confusion 
Done


Line 216:   /// Called by scanners when a range is complete.  Used to trigger 
done_ and
> done_ has migrated into the old scanner, and you explicitly reference a row
Reworded.


Line 242:   typedef boost::unordered_map<int32_t, std::pair<int, int64_t>> 
PerVolumnStats;
> std::
We rely on a bunch of utility functions like FindOrInsert() that only work on 
std::unordered_map now. I'd prefer to separate those changes.


Line 266:   const std::vector<FilterContext> filter_ctxs() const { return 
filter_ctxs_; }
> const& ?
Done


Line 304:   boost::unordered_set<int64_t> partition_ids_;
> std::
Done


Line 405:   SpinLock file_type_counts_lock_;
> isn't this specific to having multiple scanner threads?
Yes. Moved the lock into subclass.


Line 447:   /// This must be called on Close() to unregister counters.
> mention synchronization requirement?
Done


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node-mt.cc
File be/src/exec/hdfs-scan-node-mt.cc:

Line 73:     // TODO: This is probably not worth splitting the organisational 
cost of splitting
> intentional duplication?
I think we can actually move this call into Open(). The comment doesn't seem to 
apply here.


Line 74:     // initialisation across two places. Move to before the scanner 
threads start.
> reference to scanner threads
Done (see above comment).


Line 84:     RETURN_IF_ERROR(runtime_state_->io_mgr()->GetNextRange(
> single line? if not, break before call/after macro?
Done


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

Line 716
> where did this go?
I had removed it because most of it seemed wrong/misleading. Reworded comment.


Line 92:     RETURN_IF_ERROR(IssueInitialScanRanges(state));
> much better
indeed


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

Line 44: /// HDFS-serialised data.
> mention that this is not used on the non-mt path.
Done


Line 61: /// recycled.
> todo: remove once mt is fully functional
Done


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

Line 116:   /// including the final one in Close().
> this comment seems unrelated to the c'tor.
Moved into class comment.


http://gerrit.cloudera.org:8080/#/c/4113/6/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

Line 299:   /// Always returns false if the scan_node_ is not multi-threaded.
> this also deserves a clean-up todo.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I98cc7f970e1575dd83875609985e1877ada3d5e0
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to