leerho commented on a change in pull request #324:
URL: 
https://github.com/apache/incubator-datasketches-java/pull/324#discussion_r453001280



##########
File path: src/test/java/org/apache/datasketches/theta/UnionImplTest.java
##########
@@ -54,14 +54,14 @@ public void checkUpdateWithSketch() {
     assertEquals(union.getResult().getEstimate(), k, 0.0);
   }
 
-  @Test(expectedExceptions = SketchesArgumentException.class)
-  public void checkCorruptedCompactFlag() {
+  @Test
+  public void checkUnorderedCompactFlag() {
     int k = 16;
     WritableMemory mem = WritableMemory.wrap(new byte[(k*8) + 24]);
     UpdateSketch sketch = 
Sketches.updateSketchBuilder().setNominalEntries(k).build();
     for (int i=0; i<k; i++) { sketch.update(i); }
     CompactSketch sketchInDirectOrd = sketch.compact(true, mem);
-    sketch.compact(false, mem); //corrupt memory
+    sketch.compact(false, mem); //change the order bit

Review comment:
       In an UpdateSketch the hashes are in a Knuth Open Address Double Hash 
(OADH) power-of-2 sized array; essentially a hash table, but a very efficient 
one.  The placement of hashes in a hash table is random, so the hashes are 
never in any order. The CompactSketch has all the same hashes, but all the 
empty slots of the hash table have been removed.  When creating the CS, the 
user has the option to have the hashes sorted or not. Sorting costs a little 
more time when creating the CS, but when merging CSs into a union, we can take 
advantage of "early-stop" which results in dramatic improvement in merge speed 
performance.  We never "search" the hashes in a CS. There is no need to do that.
   
   The CS is the simplest representation of a sketch.  It contains primarily 
the hashes and Theta, that's it!  It is immutable and has no concept of "K" or 
"Nominal Entries".  Yet it can be used as input of all set operations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to