leerho commented on a change in pull request #324:
URL:
https://github.com/apache/incubator-datasketches-java/pull/324#discussion_r453001280
##########
File path: src/test/java/org/apache/datasketches/theta/UnionImplTest.java
##########
@@ -54,14 +54,14 @@ public void checkUpdateWithSketch() {
assertEquals(union.getResult().getEstimate(), k, 0.0);
}
- @Test(expectedExceptions = SketchesArgumentException.class)
- public void checkCorruptedCompactFlag() {
+ @Test
+ public void checkUnorderedCompactFlag() {
int k = 16;
WritableMemory mem = WritableMemory.wrap(new byte[(k*8) + 24]);
UpdateSketch sketch =
Sketches.updateSketchBuilder().setNominalEntries(k).build();
for (int i=0; i<k; i++) { sketch.update(i); }
CompactSketch sketchInDirectOrd = sketch.compact(true, mem);
- sketch.compact(false, mem); //corrupt memory
+ sketch.compact(false, mem); //change the order bit
Review comment:
In an UpdateSketch the hashes are in a Knuth Open Address Double Hash
(OADH) power-of-2 sized array; essentially a hash table, but a very efficient
one. The placement of hashes in a hash table is random, so the hashes are
never in any order. The CompactSketch has all the same hashes, but all the
empty slots of the hash table have been removed. When creating the CS, the
user has the option to have the hashes sorted or not. Sorting costs a little
more time when creating the CS, but when merging CSs into a union, we can take
advantage of "early-stop" which results in dramatic improvement in merge speed
performance. We never "search" the hashes in a CS. There is no need to do that.
The CS is the simplest representation of a sketch. It contains primarily
the hashes and Theta, that's it! It is immutable and has no concept of "K" or
"Nominal Entries". Yet it can be used as input of all set operations.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]