leerho commented on issue #599:
URL: 
https://github.com/apache/datasketches-java/issues/599#issuecomment-2380266808

   You misunderstand the function of the theta sketches.  Theta sketches count 
the **number** of unique identifiers within a stream (or set).  The result you 
get from a Theta Sketch is a count.  If you were to modify Step 2 above to:
   
   >  Step 2: find **the number of unique** users in some_table with date=today 
which were NOT in yesterdays' set of users.
   
   Then A Theta Sketch could be used to give you a count estimate, which would 
be the set operation A-Not-B.  But it would **not** give you the list of names!
   
   To do literally what you asked you could use a Bloom or Quotient Filter as 
Alex suggested:
   - Construct a Filter with all the users from yesterday's table.
   - iterate over all the users in today's table and test each one against 
yesterdays Filter.  Voila!  
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to