[ 
https://issues.apache.org/jira/browse/MADLIB-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581752#comment-16581752
 ] 

Frank McQuillan edited comment on MADLIB-1060 at 8/16/18 12:11 AM:
-------------------------------------------------------------------

I think so.  The expressions must return a valid type for the parameters, which 
is a numeric array:

{code}
point_source
TEXT. Name of the table containing the training data points. Training data 
points are expected to be stored row-wise in a column of type DOUBLE 
PRECISION[].

point_column_name
TEXT. Name of the column with training data points.
{code}

and

{code}
test_source
TEXT. Name of the table containing the test data points. Testing data points 
are expected to be stored row-wise in a column of type DOUBLE PRECISION[].

test_column_name
TEXT. Name of the column with testing data points.
{code}

If the user puts an expression that does not evaluate to a numeric array, then 
it will fail and they will get this error, which is fine:

{code}
InternalError: (psycopg2.InternalError) plpy.Error: kNN Error: Feature column 
'data' in test table is not an array.
CONTEXT:  Traceback (most recent call last):
  PL/Python function "knn", line 33, in <module>
    weighted_avg
  PL/Python function "knn", line 160, in knn
  PL/Python function "knn", line 63, in knn_validate_src
PL/Python function "knn"
PL/pgSQL function madlib.knn(character varying,character varying,character 
varying,character varying,character varying,character varying,character 
varying,character varying,integer,boolean,text) line 5 at assignment
 [SQL: "SELECT * FROM madlib.knn(\n                'knn_train_data',      -- 
Table of training data\n                'data',                -- Col name of 
training data\n                'id',                  -- Col name of id in 
train data\n                'label',               -- Training labels\n         
       'knn_test_data',       -- Table of test data\n                'data',    
            -- Col name of test data\n                'id',                  -- 
Col name of id in test data\n                'knn_result_classification',  -- 
Output table\n                 3,                    -- Number of nearest 
neighbors\n                 True,                 -- True to list 
nearest-neighbors by id\n                 'madlib.squared_dist_norm2' -- 
Distance function\n                );"]
{code}


was (Author: fmcquillan):
I think so.  The expressions must return a valid type for the parameters, which 
is a numeric array:

{code}
point_source
TEXT. Name of the table containing the training data points. Training data 
points are expected to be stored row-wise in a column of type DOUBLE 
PRECISION[].

point_column_name
TEXT. Name of the column with training data points.
{code}

and

{code}
test_source
TEXT. Name of the table containing the test data points. Testing data points 
are expected to be stored row-wise in a column of type DOUBLE PRECISION[].

test_column_name
TEXT. Name of the column with testing data points.
{code}

> Support expressions for column names in k-NN
> --------------------------------------------
>
>                 Key: MADLIB-1060
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1060
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: k-NN
>            Reporter: Frank McQuillan
>            Assignee: Himanshu Pandey
>            Priority: Minor
>              Labels: starter
>             Fix For: v2.0
>
>
> Follow on to 
> https://issues.apache.org/jira/browse/MADLIB-927
> {code}
> knn( point_source,
>      point_column_name,
>      label_column_name,
>      test_source,
>      test_column_name,
>      id_column_name,
>      output_table,
>      operation,
>      k
>    )
> {code}
> Possible improvements:
> 1) The parameters 'point_column_name' and 'test_column_name' should support 
> regular PostgreSQL expressions.
> 2) Should we infer 'c' or 'r' from the data types, rather than have to say 
> explicitly?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to