You might be interested in a current java PR #349
<https://github.com/apache/datasketches-java/pull/349> that is adding
Jaccard similarity to the Tuple sketches and is capable of doing a
jaccard(Tuple, Theta) as well.
This doesn't immediately solve the problem for Hive, but when it appears in
a release version, we would want to leverage it in the Hive adaptor.

Lee.


On Mon, Feb 22, 2021 at 1:28 PM Sebastian Klemke
<[email protected]> wrote:

> Hey,
>
> great, will do exactly that :-)
>
> Best regards,
>
> Sebastian
>
>
> On Mon, 2021-02-22 at 08:46 -0800, Alexander Saydakov wrote:
> > Sebastian,
> > Yes, a pull request is the way to go.
> >
> > On Sun, Feb 21, 2021 at 9:28 PM leerho <[email protected]> wrote:
> >
> > > Thanks for offering a contribution.  The person best able to handle
> > > this
> > > has been out.  He will be back this coming week.
> > > Cheers.
> > > Lee.
> > >
> > > On Sat, Feb 20, 2021 at 10:07 AM Sebastian Klemke
> > > <[email protected]> wrote:
> > >
> > > > Hi!
> > > >
> > > > Thanks for providing the datasketches library, it's a really
> > > > powerful
> > > > tool that I use in several projects. Lately, I have been using
> > > > the
> > > > Jaccard similarity estimator and found it would be easier to use
> > > > if it
> > > > was available as Hive UDF. I created such Hive UDF here:
> > > >
> > > >
> > > >
> https://github.com/packet23/datasketches-hive/commit/9c0d72537ed5cede45d6b5282789af01a158af35
> > > > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_packet23_datasketches-2Dhive_commit_9c0d72537ed5cede45d6b5282789af01a158af35&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=kjRPMYeMlrDsfXXVHUp0sUkLCHQUplHh-j_DQlE3W3g&s=iBN3nPWQzJ_MjDpIgDqfwX6lGW0Cz46nksXoqqDYmbY&e=
> > > > >
> > > >
> > > > but I'm unclear how to proceed with contribution: Should I just
> > > > make a
> > > > pull request on github or do you prefer other means?
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Sebastian
> > > >
> > > >
> > > > --
> > > > Sebastian Klemke
> > > > [email protected]
> > > >            147EEC173170C3F1A19F200244741CA8D4106FE9 @
> > > > keys.openpgp.org
> > > > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__keys.openpgp.org&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=kjRPMYeMlrDsfXXVHUp0sUkLCHQUplHh-j_DQlE3W3g&s=aczAzZ77PXQGZ-3FCrBg6H9bAm76yoVBO--t6U0UqsM&e=
> > > > >
> > > >
> > >
>
> --
> Sebastian Klemke                                    [email protected]
>            147EEC173170C3F1A19F200244741CA8D4106FE9 @ keys.openpgp.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to