Thank you for following up on that JIRA James. Based on some more code exploration, it looks like we should be able to replace the native implementation of array_dot() with Eigen's dot() function. array_dot() currently takes in `anyarray` as you pointed out, and cosine_similarity() takes in double precision arrays.
- But I was able to run cosine_similarity() on int[], float8[] and double precision[] vector pairs without any issues. - I also checked that the current array_dot() returns a float8, and not the type of the input arrays, while cosine_similarity() returns a double. - Internally in MADlib, a few modules (GLM, SVM, SVD, matrix_ops, and conjugate gradient) use the array_dot() function, and they too should not be affected by this change. So it looks like there might not be any backward compatibility breaking changes if we replace the native array_dot() with Eigen's dot(). NJ On Thu, Oct 19, 2017 at 11:05 AM, Frank McQuillan <fmcquil...@pivotal.io> wrote: > Thanks for the follow up James. > > I just chatted with Nandish @njayaram and he said he would have a closer > look and follow up. Be great to improve performance here, especially if > implementation cost is low. > > Frank > > On Thu, Oct 19, 2017 at 6:15 AM, James Gregory <james....@gmail.com> > wrote: > > > One possibility would be to inspect the data types of the input, and > > then use eigen if the input is all double precision, otherwise defer > > to the native implementation? > > > > I've also just noticed that 'float' in postgres means double > > precision, so actually the return type wouldn't change after all, as > > long as eigen is only used for double precision input. > > > > I've also just noticed the documentation for the existing dot_product > > says "Return type is the same as the input type", but I think maybe > > this is wrong. > > > > On 19 October 2017 at 12:21, James Gregory <james....@gmail.com> wrote: > > > Continuing on from: > > > > > > http://mail-archives.apache.org/mod_mbox/incubator-madlib- > > user/201702.mbox/%3CCAAKWcNHoxdVdEjRsAOjC9Zukjo2e5cLBP2XPFFC6f8XiXSMykA% > > 40mail.gmail.com%3E > > > > > > and > > > > > > https://issues.apache.org/jira/projects/MADLIB/issues/ > > MADLIB-1067?filter=allopenissues > > > > > > Cosine similiarity is currently faster than array_dot, because it uses > > > Eigen rather than a native implementation. I was looking at how hard > > > it might be to change array_dot to use eigen as well. I can see that > > > array_dot takes arguments of anyarray and returns float, whilst the > > > eigen-based methods take DOUBLE PRECISION[] and return DOUBLE > > > PRECISION. Does this mean the method cannot be safely replaced without > > > breaking backward compatability? > > > > > > Maybe I could add a function called fast_array_dot, though that seems > > > a bit messy? > > > > > > Or if the function can be outright replaced, I noticed that the eigen > > > functions are declared in > > > src/ports/postgres/modules/linalg/linalg.sql_in whilst the native > > > array functions are declared in > > > methods/array_ops/src/pg_gp/array_ops.sql_in, so in that case should > > > the declaration be moved? > > > > > > -- > > > James > > > > > > > > -- > > James > > >