Hey All!

In the recent months I was working with Jesus Camacho Rodriguez on integrating 
DataSketches more tightly with Hive [1].

So..from Hive 4.0 : almost all the datasketches functions will be available in by default; to do this - I had to come up with some naming convention/etc (ds_{sketchType}_{functionName}) to register all ds functions. I will contribute back some of these changes; but I was able to avoid changing even datasketches-hive so far - I've noticed that there are some "simple" functions which are missing; and they should be there - just for completeness reasons (iirc mostly toString function and probably a few more).

Probably the most interesting for you is that by utilizing Calcite a set of rules can transparently rewrite COUNT(DISTINCT)/PERCENTILE_DISC/CUME_DIST/RANK/NTILE to use sketch functions! :)
Materialized views are also supported - so that sketches can be stored 
precomputed(and rolled up).

If you would like to get a quick look what it does; the test for rewriting rank 
[2] shows a few statements.

Thank you for this great library!

cheers,
Zoltan

[1] https://issues.apache.org/jira/browse/HIVE-22939
[2] 
https://github.com/apache/hive/blob/e4256fc91fe2c123428400f3737883a83208d29e/ql/src/test/queries/clientpositive/sketches_rewrite_rank.q#L15

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to