[
https://issues.apache.org/jira/browse/ORC-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018604#comment-16018604
]
ASF GitHub Bot commented on ORC-184:
------------------------------------
Github user wgtmac commented on a diff in the pull request:
https://github.com/apache/orc/pull/123#discussion_r117616712
--- Diff: c++/src/Statistics.hh ---
@@ -106,15 +106,50 @@ namespace orc {
void setMinimum(T min) { _minimum = min; }
- // GET / SET valueCount_
+ // GET / SET _valueCount
uint64_t getNumberOfValues() const { return _valueCount; }
void setNumberOfValues(uint64_t numValues) { _valueCount = numValues; }
- // GET / SET hasNullValue_
+ // GET / SET _hasNullValue
bool hasNull() const { return _hasNull; }
void setHasNull(bool hasNull) { _hasNull = hasNull; }
+
+ void reset() {
+ _hasNull = false;
+ _hasMinimum = false;
+ _hasMaximum = false;
+ _hasSum = false;
+ _hasTotalLength = false;
+ _totalLength = 0;
+ _valueCount = 0;
+ }
+
+ // sum is not merged here as we need to check overflow
+ void merge(const InternalStatisticsImpl& other) {
+ _hasNull = _hasNull || other._hasNull;
+ _valueCount += other._valueCount;
+
+ if (other._hasMinimum) {
+ if (!_hasMinimum) {
+ _hasMinimum = _hasMaximum = true;
+ _minimum = other._minimum;
+ _maximum = other._maximum;
+ } else {
+ // all template types should support operator<
+ if (_maximum < other._maximum) {
--- End diff --
@majetideepak I agree with you for not putting opeator< into public API.
The season to implement operator< is to simplify the logic here, otherwise we
have to specialize InternalStatisticsImpl<Decimal>::merge(const
InternalStatisticsImpl& other). Now there are two options:
1. add Decimal::compare() and use the template specilization for Decimal;
2. move opeator< of decimal here to simplify the merge logic.
Which do you prefer or you have other ideas? Thanks!
> [C++] Refactor ColumnStatistics classes for writer
> --------------------------------------------------
>
> Key: ORC-184
> URL: https://issues.apache.org/jira/browse/ORC-184
> Project: ORC
> Issue Type: Sub-task
> Components: C++
> Reporter: Gang Wu
> Assignee: Gang Wu
>
> 1. Add setter functions to ColumnStatistics.
> 2. Refactor ColumnStatistics to reduce duplicate code.
> 3. Add more functions in Int128 and Decimal classes for
> DecimalColumnStatistics.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)