GitHub user mmiklavc reopened a pull request:
https://github.com/apache/incubator-metron/pull/397
METRON-627: Add HyperLogLogPlus implementation to Stellar
This PR addresses https://issues.apache.org/jira/browse/METRON-627
Leverages the HLLP implementation from
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/cardinality/HyperLogLogPlus.java
4 new Stellar functions have been added that allow a user to initialize a
cardinality estimator, add items, merge estimators, and calculate cardinality
estimates.
### `HLLP_CARDINALITY`
* Description: Returns HyperLogLogPlus-estimated cardinality for this set
* Input:
* hyperLogLogPlus - the hllp set
* Returns: Long value representing the cardinality for this set
### `HLLP_INIT`
* Description: Initializes the set
* Input:
* p (required) - the precision value for the normal set
* sp - the precision value for the sparse set. If sp is not specified
the sparse set will be disabled.
* Returns: A new HyperLogLogPlus set
### `HLLP_MERGE`
* Description: Merge hllp sets together
* Input:
* hllp1 - first hllp set
* hllp2 - second hllp set
* hllpn - additional sets to merge
* Returns: A new merged HyperLogLogPlus estimator set
### `HLLP_OFFER`
* Description: Add value to the set
* Input:
* hyperLogLogPlus - the hllp set
* o - Object to add to the set
* Returns: The HyperLogLogPlus set with a new object added
**Note:** Added new library to metron-common pom and added 3 new items to
dependencies_with_url.csv.
**Testing**
Spun up the Stellar REPL in quick-dev. And verified that the function
composition is working as expected and returning correct cardinality estimates
for simple sparse set cases. For example:
```
[Stellar]>>> HLLP_CARDINALITY(HLLP_MERGE(
HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "runnings"), "cool"),
HLLP_OFFER(HLLP_OFFER(HLLP_INIT(5, 6), "bobsled"), "team")))
4
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mmiklavc/incubator-metron hyperloglog
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/397.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #397
----
commit afce30539f6996a607e85d3fd35aac5fcb5c19aa
Author: Michael Miklavcic <[email protected]>
Date: 2016-12-15T20:55:39Z
METRON-627: Add HyperLogLogPlus implementation to Stellar
commit 414a3a98976b98a253ab9921720f02c8a7431da2
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-09T17:00:08Z
work in progress commit
commit c7f57a4acbb0ef357c1af9eaa263afea7bc83d9a
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-11T16:58:58Z
Merge with master
commit 90d9659f415404c6c4682289c7bde669c352f517
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-12T20:33:10Z
Refactor, fix statistics output
commit 261e69651d4ae0b99e88e0e4a2c4e7568aa23fcb
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-12T23:17:13Z
METRON-627: Updated with sensible default precision values
commit 9078094dd720d89f64ecf45506ab0c5077aa58a7
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-13T19:17:37Z
METRON-627: Add default init for HLLP_ADD(null, 'val')
commit 9e1ff937fe51841ac2fa3235bf87964cba8a1ae8
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-17T20:09:26Z
Merge branch 'master' into hyperloglog
commit d392f044e330fe273cb0f0b4ff820b4ef1a3595d
Author: Michael Miklavcic <[email protected]>
Date: 2017-01-17T20:33:11Z
METRON-627: Fix Stellar lexer to handle newline at end
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---