Re: pca_train error

Frank McQuillan Tue, 05 Apr 2016 09:27:17 -0700

Please clarify the platform - do you mean GPDB 4.2.0?

Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
software and we can see if you still have a problem.


Frank

On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <evasi...@pivotal.io> wrote:

> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>
> Thanks.
>
> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
>> Thanks for the question, Esther.  What version of MADlib are you using
>> and what database platform and version are you running on?
>>
>> It seems to be a MADlib version lower than 1.8 since the error message
>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>> allow user-specified column names in PCA.)
>>
>> Frank
>>
>>
>>
>>
>>
>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <evasi...@pivotal.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use pca_train but I am running through this error:
>>>
>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>> Function "madlib.__matrix_densify_sfunc(double
>>> precision[],integer,integer,double precision)": invalid argument - col
>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>> pid=104068) (plpython.c:4648)
>>> SQL state: XX000
>>> Context: Traceback (most recent call last):
>>>   PL/Python function "pca_train", line 23, in <module>
>>>     return pca.pca(**globals())
>>>   PL/Python function "pca_train", line 404, in pca
>>> PL/Python function "pca_train"
>>>
>>> My input table has 15472 rows and two columns; a row_id and an array
>>> with 853 features. I am calling pca_train like this:
>>>
>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>                                            'ev.hci_subset_pca_output',
>>>                                            'row_id',
>>>                                             3);
>>>
>>> I unfortunately cannot share the data but this is how it looks in
>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>> large and this is why it appears to be empty but it isn't as you can see in
>>> the second screenshot.
>>>
>>> [image: Inline image 1]
>>>
>>> [image: Inline image 3]
>>>
>>> I am not sure why I am running through this error. Please advice.
>>>
>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>> with 1. Still getting the same error.
>>>
>>> Thanks,
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasi...@pivotal.io
>>>
>>>
>>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasi...@pivotal.io
>

Re: pca_train error

Reply via email to