Thanks for the update Esther. Frank
On Wed, Apr 6, 2016 at 3:53 PM, Esther Vasiete <evasi...@pivotal.io> wrote: > Upgrading to MADlib 1.8 solved the problem! > > Thanks, > Esther > > On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <evasi...@pivotal.io> > wrote: > >> Oh sorry, it is HAWQ 1.3.1. >> >> And the data engineer will upgrade to MADlib 1.8 tonight. >> >> Thanks, >> Esther >> >> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fmcquil...@pivotal.io> >> wrote: >> >>> Please clarify the platform - do you mean GPDB 4.2.0? >>> >>> Would you be able to upgrade to MADlib 1.8? Then you are using the >>> latest software and we can see if you still have a problem. >>> >>> Frank >>> >>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <evasi...@pivotal.io> >>> wrote: >>> >>>> I am using MADlib 1.7.1 on HAWQ 4.2.0. >>>> >>>> Thanks. >>>> >>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fmcquil...@pivotal.io> >>>> wrote: >>>> >>>>> Thanks for the question, Esther. What version of MADlib are you using >>>>> and what database platform and version are you running on? >>>>> >>>>> It seems to be a MADlib version lower than 1.8 since the error message >>>>> you report is different in the 1.8 release. (There was a bug fix in 1.8 >>>>> to >>>>> allow user-specified column names in PCA.) >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <evasi...@pivotal.io> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to use pca_train but I am running through this error: >>>>>> >>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError: >>>>>> Function "madlib.__matrix_densify_sfunc(double >>>>>> precision[],integer,integer,double precision)": invalid argument - col >>>>>> should be in the range of [0, col_dim) (seg35 awsaiuirl1178:40003 >>>>>> pid=104068) (plpython.c:4648) >>>>>> SQL state: XX000 >>>>>> Context: Traceback (most recent call last): >>>>>> PL/Python function "pca_train", line 23, in <module> >>>>>> return pca.pca(**globals()) >>>>>> PL/Python function "pca_train", line 404, in pca >>>>>> PL/Python function "pca_train" >>>>>> >>>>>> My input table has 15472 rows and two columns; a row_id and an array >>>>>> with 853 features. I am calling pca_train like this: >>>>>> >>>>>> DROP TABLE if exists ev.hci_subset_pca_output; >>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input', >>>>>> 'ev.hci_subset_pca_output', >>>>>> 'row_id', >>>>>> 3); >>>>>> >>>>>> I unfortunately cannot share the data but this is how it looks in >>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too >>>>>> large and this is why it appears to be empty but it isn't as you can see >>>>>> in >>>>>> the second screenshot. >>>>>> >>>>>> [image: Inline image 1] >>>>>> >>>>>> [image: Inline image 3] >>>>>> >>>>>> I am not sure why I am running through this error. Please advice. >>>>>> >>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id" >>>>>> starts with 1. Still getting the same error. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> *Esther Vasiete * >>>>>> *Data Scientist | Pivotal* >>>>>> evasi...@pivotal.io >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Esther Vasiete * >>>> *Data Scientist | Pivotal* >>>> evasi...@pivotal.io >>>> >>> >>> >> >> >> -- >> *Esther Vasiete * >> *Data Scientist | Pivotal* >> evasi...@pivotal.io >> > > > > -- > *Esther Vasiete * > *Data Scientist | Pivotal* > evasi...@pivotal.io >