Thank you for following up on that JIRA James. Based on some more code
exploration, it looks like we should be able to replace the native
implementation
of array_dot() with Eigen's dot() function. array_dot() currently takes in
`anyarray`
as you pointed out, and cosine_similarity() takes in double precision
arrays.

- But I was able to run cosine_similarity() on int[], float8[] and double
precision[]
vector pairs without any issues.
- I also checked that the current array_dot() returns a float8, and not
the type of the input arrays, while cosine_similarity() returns a double.
- Internally in MADlib, a few modules (GLM, SVM, SVD, matrix_ops, and
conjugate
gradient) use the array_dot() function, and they too should not be affected
by
this change.

So it looks like there might not be any backward compatibility breaking
changes if we replace the native array_dot() with Eigen's dot().

NJ

On Thu, Oct 19, 2017 at 11:05 AM, Frank McQuillan <fmcquil...@pivotal.io>
wrote:

> Thanks for the follow up James.
>
> I just chatted with Nandish @njayaram and he said he would have a closer
> look and follow up.  Be great to improve performance here, especially if
> implementation cost is low.
>
> Frank
>
> On Thu, Oct 19, 2017 at 6:15 AM, James Gregory <james....@gmail.com>
> wrote:
>
> > One possibility would be to inspect the data types of the input, and
> > then use eigen if the input is all double precision, otherwise defer
> > to the native implementation?
> >
> > I've also just noticed that 'float' in postgres means double
> > precision, so actually the return type wouldn't change after all, as
> > long as eigen is only used for double precision input.
> >
> > I've also just noticed the documentation for the existing dot_product
> > says "Return type is the same as the input type", but I think maybe
> > this is wrong.
> >
> > On 19 October 2017 at 12:21, James Gregory <james....@gmail.com> wrote:
> > > Continuing on from:
> > >
> > > http://mail-archives.apache.org/mod_mbox/incubator-madlib-
> > user/201702.mbox/%3CCAAKWcNHoxdVdEjRsAOjC9Zukjo2e5cLBP2XPFFC6f8XiXSMykA%
> > 40mail.gmail.com%3E
> > >
> > > and
> > >
> > > https://issues.apache.org/jira/projects/MADLIB/issues/
> > MADLIB-1067?filter=allopenissues
> > >
> > > Cosine similiarity is currently faster than array_dot, because it uses
> > > Eigen rather than a native implementation. I was looking at how hard
> > > it might be to change array_dot to use eigen as well. I can see that
> > > array_dot takes arguments of anyarray and returns float, whilst the
> > > eigen-based methods take DOUBLE PRECISION[] and return DOUBLE
> > > PRECISION. Does this mean the method cannot be safely replaced without
> > > breaking backward compatability?
> > >
> > > Maybe I could add a function called fast_array_dot, though that seems
> > > a bit messy?
> > >
> > > Or if the function can be outright replaced, I noticed that the eigen
> > > functions are declared in
> > > src/ports/postgres/modules/linalg/linalg.sql_in whilst the native
> > > array functions are declared in
> > > methods/array_ops/src/pg_gp/array_ops.sql_in, so in that case should
> > > the declaration be moved?
> > >
> > > --
> > > James
> >
> >
> >
> > --
> > James
> >
>

Reply via email to