[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..

IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

Currently, DictEncoder uses the default hash function for
TimestampValue, which means it is hashing the entire
TimestampValue struct. This can be inconsistent, because
TimestampValue contains some padding that may not be zero
in some cases. For TimestampValues that are part of a Tuple,
the padding is zero, so this is mainly present in test cases.

This was discovered when fixing a Clang Tidy performance-for-range-copy
warning by iterating with a const reference rather than
making a copy of the value. DictTest.TestTimestamps became
flaky with that change, because the hash was no longer
consistent. The copy must have had consistent content for
the padding through the iteration, but the const reference
did not.

This adds a template specialization of the Hash function
for TimestampValue. The specialization uses TimestampValue::Hash(),
which hashes only the non-padding pieces of the struct. This
also includes the change to dict-test.cc that uncovered the
issue. This fix is mostly to unblock IMPALA-12390.

Testing:
 - Ran dict-test in a loop for a few hundred iterations
 - Hand tested inserting many timestamps into a Parquet table
   with dictionary encoding and verified that the performance didn't
   change.

Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Reviewed-on: http://gerrit.cloudera.org:8080/20396
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
Reviewed-by: Michael Smith 
---
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
2 files changed, 8 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Daniel Becker: Looks good to me, but someone else must approve
  Michael Smith: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG@14
PS1, Line 14: the padding is zero, so this is mainly present in test cases.
> I was putting together an initial change for IMPALA-12390, but my GVO run f
Done



--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:30:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-24 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 24 Aug 2023 16:26:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-23 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2: Code-Review+1

LGTM


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 23 Aug 2023 12:03:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 23 Aug 2023 05:01:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2:

With subsequent investigation, it looks like this is a test-only issue, so I 
updated the commit message to make that clear. I also double-checked that 
performance isn't worse with TimestampValue::Hash().


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 23 Aug 2023 00:44:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Joe McDonnell (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20396

to look at the new patch set (#2).

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..

IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

Currently, DictEncoder uses the default hash function for
TimestampValue, which means it is hashing the entire
TimestampValue struct. This can be inconsistent, because
TimestampValue contains some padding that may not be zero
in some cases. For TimestampValues that are part of a Tuple,
the padding is zero, so this is mainly present in test cases.

This was discovered when fixing a Clang Tidy performance-for-range-copy
warning by iterating with a const reference rather than
making a copy of the value. DictTest.TestTimestamps became
flaky with that change, because the hash was no longer
consistent. The copy must have had consistent content for
the padding through the iteration, but the const reference
did not.

This adds a template specialization of the Hash function
for TimestampValue. The specialization uses TimestampValue::Hash(),
which hashes only the non-padding pieces of the struct. This
also includes the change to dict-test.cc that uncovered the
issue. This fix is mostly to unblock IMPALA-12390.

Testing:
 - Ran dict-test in a loop for a few hundred iterations
 - Hand tested inserting many timestamps into a Parquet table
   with dictionary encoding and verified that the performance didn't
   change.

Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
---
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
2 files changed, 8 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/20396/2
--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9621/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 23 Aug 2023 00:44:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG@14
PS1, Line 14: This was discovered when fixing a Clang Tidy 
performance-for-range-copy
> Was there a separate ticket for the warning? Or just looking into it lead t
I was putting together an initial change for IMPALA-12390, but my GVO run 
failed on the DictTest.TestTimestamps test. This is what popped out.



--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:32:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13811/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:33:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20396 )

Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20396/1//COMMIT_MSG@14
PS1, Line 14: This was discovered when fixing a Clang Tidy 
performance-for-range-copy
Was there a separate ticket for the warning? Or just looking into it lead to 
this issue.



--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 22 Aug 2023 18:11:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

2023-08-22 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20396


Change subject: IMPALA-12393: Fix inconsistent hash for TimestampValue in 
DictEncoder
..

IMPALA-12393: Fix inconsistent hash for TimestampValue in DictEncoder

Currently, DictEncoder uses the default hash policy for TimestampValue,
which means it is hashing the entire TimestampValue struct. This is
inconsistent, because TimestampValue contains some padding that is
not guaranteed to be zero.

This was discovered when fixing a Clang Tidy performance-for-range-copy
warning by iterating with a const reference rather than making a copy of
the value. DictTest.TestTimestamps became flaky with that change, because
the hash was no longer consistent. The copy must have had consistent content
for the padding through the iteration, but the const reference did not.

This adds a template specialization of the Hash function for TimestampValue.
The specialization uses TimestampValue::Hash(), which hashes only the 
non-padding
pieces of the struct. This includes the change to dict-test.cc that
uncovered the issue.

Testing:
 - Ran dict-test in a loop for a few hundred iterations

Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
---
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
2 files changed, 8 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/20396/1
--
To view, visit http://gerrit.cloudera.org:8080/20396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iad86e9b0f645311c3389cf2804dcc1a346ff10a9
Gerrit-Change-Number: 20396
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell