Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3823: Add timer to measure Parquet footer reads
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4371/4/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS4, Line 196: RuntimeProfile::Counter 
single_footer_process_timer(TUnit::TIME_NS);
> this is unnecessary, SCOPED_TIMER below just creates a ScopedTimer which cr
Done


http://gerrit.cloudera.org:8080/#/c/4371/4/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

PS4, Line 49: ADD_MIN_MAX_AVG_VALUE_TIMER
> please rename
Sorry, forgot to get this one. Done.


PS4, Line 70: ADD_MIN_MAX_AVG_VALUE_TIMER
> same
Done


PS4, Line 147: .
> can you add:
When you say 'child counters', do you mean it in the terms of the child 
counters of counters in the RuntimeProfile? If so, I think it wouldn't be best 
to assume how it would be used (even though that's the only use for it now).

Or do you mean every counter that the AveragedCounter keeps a track of is what 
we are calling a 'child counter'?


PS4, Line 204: /// Contrary to the AveragedCounter, this only keeps a track of 
values and not 'Counter'
             : /// objects themselves.
> this is a bit misleading because it doesn't really keep values, it keeps th
Good call. I rephrased it, but I left out the bit about 'child counters'. I'll 
add it if we reach consensus on the above comment.


PS4, Line 249:  /// Current sums of values seen so far and min/max values seen 
so far.
             :   int64_t current_min_;
             :   int64_t current_max_;
             :   int64_t current_sum_;
> current is unnecessary
Done


http://gerrit.cloudera.org:8080/#/c/4371/4/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS4, Line 301:       const TSummaryStatsCounter& c = 
node.summary_stats_counters[i];
             :       SummaryStatsCounterMap::iterator it = 
summary_stats_map_.find(c.name);
             :       if (it == summary_stats_map_.end()) {
             :         summary_stats_map_[c.name] =
             :             pool_->Add(new SummaryStatsCounter(
             :                 c.unit, c.total_num_values, c.min_value, 
c.max_value, c.sum));
             :       } else {
             :         it->second->SetStats(c);
             :       }
> it's hard to know if this gets exercised in your test case.
Done


PS4, Line 667:  
> super nit: extra space
Done


Line 668:     for (const SummaryStatsCounterMap::value_type& v: 
summary_stats_map_) {
> I think we need a separate case to handle when total_num_values is 0 becaus
Done


PS4, Line 669: (
> this brace isn't closed at the end
Done


PS4, Line 676: v.second->unit()
> this isn't in the same Counter's units (which is time for the use case you 
Oops, yes. Done.


PS4, Line 1017:  else if
> this isn't an else if, you need to execute both branches
Done


http://gerrit.cloudera.org:8080/#/c/4371/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS4, Line 336: MinMaxAvgValueCounter
> update
Done


PS4, Line 337: averages the time taken to
             :     # scan and process the Parquet footer across scan ranges and 
also records the min and
             :     # the max time.
> keeps statistics for the time taken to scan and Process the Parquet footer 
Done


PS4, Line 340:     if ranges_per_node > 1:
             :       num_equal = 0
             :       for min_max_time in footer_processing_time_list:
             :         num_equal += (min_max_time[0] == min_max_time[1])
             :       # The chance that the times taken by different scan ranges 
to process the footer,
             :       # are equal, is very unlikely. However, it could happen 
which could make the min and
             :       # max time to process a footer equal. Assert that at a 
time, no more than 1 node has
             :       # equal min and max time.
             :       assert num_equal < 2
> following up with my previous comment, I still think this is weird and was 
I've removed the multi scan range test and just added a simple verify assert to 
see if all the values are the same if we have only one range per node.

I've added BE unit tests to verify that the counter works properly, so we don't 
need to do that again here.


-- 
To view, visit http://gerrit.cloudera.org:8080/4371
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to