[ https://issues.apache.org/jira/browse/IMPALA-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-3357 stopped by Norbert Luksa. --------------------------------------------- > Sorter's quicksort implementation is very suboptimal for duplicate keys > ----------------------------------------------------------------------- > > Key: IMPALA-3357 > URL: https://issues.apache.org/jira/browse/IMPALA-3357 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 2.6.0 > Reporter: Tim Armstrong > Assignee: Norbert Luksa > Priority: Minor > Labels: perf, ramp-up > > {code} > void Sorter::TupleSorter::SortHelper(TupleIterator first, TupleIterator last) > { > if (UNLIKELY(state_->is_cancelled())) return; > // Use insertion sort for smaller sequences. > while (last.index_ - first.index_ > INSERTION_THRESHOLD) { > TupleIterator iter(this, first.index_ + (last.index_ - first.index_) / 2); > DCHECK(iter.current_tuple_ != NULL); > // Partition() splits the tuples in [first, last) into two groups (<= > pivot > // and >= pivot) in-place. 'cut' is the index of the first tuple in the > second group. > TupleIterator cut = Partition(first, last, > reinterpret_cast<Tuple*>(iter.current_tuple_)); > SortHelper(cut, last); > last = cut; > if (UNLIKELY(state_->is_cancelled())) return; > } > InsertionSort(first, last); > } > {code} > The quicksort implementation in the sorter is based on dividing the input > into two partitions: <= pivot and >= pivot. > If all of the input values in a partition are equal, then it will still > recursively divide and do insertion sort on the values. We could change the > sorter to partition the input into three partitions: <, == and >. Then it > doesn't need to recurse on the middle partition. This would mean it could > sort a partition full of duplicate values in a single pass over the input. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org