[ 
https://issues.apache.org/jira/browse/MADLIB-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1239:
------------------------------------
    Description: 
Columns to Vector

Converts features from multiple columns of an input table into a feature array 
in a single column.
This process can be reversed using the function vec2cols.

{code}
cols2vec(
    source_table,
    out_table,
    list_of_features,
    list_of_features_to_exclude,
    cols_to_output
    )

source_table
TEXT. Name of the table containing the source data.

out_table
TEXT. Name of the generated table containing the output. If a table with the 
same name already exists, an error will be returned. 

list_of_features
TEXT. Comma-separated string of column names or expressions to put into feature 
array. Can also be a '*' implying all columns are to be put into feature array 
(except for the ones included in the next argument that lists exclusions). 
Array columns in the source table are not supported in the 'list_of_features'.

PostgreSQL arrays only allow elements of the same type.  If multiple numeric 
types are present in the 'list_of_features', they will be cast to the largest 
type.  For example, if there are INTEGER and DOUBLE PRECISION columns in the 
feature list, the feature array will be of type DOUBLE PRECISION[].  Invalid 
combinations like TEXT and INTEGER will result in an error.

list_of_features_to_exclude (optional)
TEXT, default NULL. Comma-separated string of column names to exclude from the 
feature array.  Use only when 'list_of_features' is '*'.

cols_to_output (optional)
TEXT, default NULL. Comma-separated string of column names from the source 
table to keep in the output table, in addition to the feature array.  To keep 
all columns from the source table, use '*'.


Output

The output table produced by the cols2vec function contains the following 
columns:

<...>
Columns from source table, depending on which ones are kept (if any).

feature_vector
Array of features.  Array type will depend on feature type in the source table.


A summary table named <out_table>_summary is also created at the same time, 
which lists the names of the features.

feature_names
TEXT[] Array of names of features.
{code}


Notes

(1)
The function
http://pivotalsoftware.github.io/PDLTools/group__grp__array__utilities.html#cols2vec_example
is similar but the proposed MADlib one has more options.  To do the equivalent 
of the PDL Tools one in MADlib, you would do:

{code}
cols2vec(
    table_name,
    output_table,
    '*',
    exclude_columns,
    '*',
    )
{code}

(2)
Please put the feature vector on the right side of the output table, i.e., it 
will be the last column on the right.


  was:

Columns to Vector

Converts features from multiple columns of an input table into a feature array 
in a single column.
This process can be reversed using the function vec2cols.

{code}
cols2vec(
    source_table,
    out_table,
    list_of_features,
    list_of_features_to_exclude,
    cols_to_output
    )

source_table
TEXT. Name of the table containing the source data.

out_table
TEXT. Name of the generated table containing the output. If a table with the 
same name already exists, an error will be returned. 

list_of_features
TEXT. Comma-separated string of column names or expressions to put into feature 
array. Can also be a '*' implying all columns are to be put into feature array 
(except for the ones included in the next argument that lists exclusions). 
Array columns in the source table are not supported in the 'list_of_features'.

PostgreSQL arrays only allow elements of the same type.  If multiple numeric 
types are present in the 'list_of_features', they will be cast to the largest 
type.  For example, if there are INTEGER and DOUBLE PRECISION columns in the 
feature list, the feature array will be of type DOUBLE PRECISION[].  Invalid 
combinations like TEXT and INTEGER will result in an error.

list_of_features_to_exclude (optional)
TEXT, default NULL. Comma-separated string of column names to exclude from the 
feature array.  Use only when 'list_of_features' is '*'.

cols_to_output (optional)
TEXT, default NULL. Comma-separated string of column names from the source 
table to keep in the output table, in addition to the feature array.  To keep 
all columns from the source table, use '*'.


Output

The output table produced by the cols2vec function contains the following 
columns:

<...>
Columns from source table, depending on which ones are kept (if any).

feature_vector
Array of features.  Array type will depend on feature type in the source table.


A summary table named <out_table>_summary is also created at the same time, 
which lists the names of the features.

feature_names
TEXT[] Array of names of features.
{code}

Note

The function
http://pivotalsoftware.github.io/PDLTools/group__grp__array__utilities.html#cols2vec_example
is similar but the proposed MADlib one has more options.  To do the equivalent 
of the PDL Tools one in MADlib, you would do:

{code}
cols2vec(
    table_name,
    output_table,
    '*',
    exclude_columns,
    '*',
    )
{code}


> Columns to Vector
> -----------------
>
>                 Key: MADLIB-1239
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1239
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.15
>
>
> Columns to Vector
> Converts features from multiple columns of an input table into a feature 
> array in a single column.
> This process can be reversed using the function vec2cols.
> {code}
> cols2vec(
>     source_table,
>     out_table,
>     list_of_features,
>     list_of_features_to_exclude,
>     cols_to_output
>     )
> source_table
> TEXT. Name of the table containing the source data.
> out_table
> TEXT. Name of the generated table containing the output. If a table with the 
> same name already exists, an error will be returned. 
> list_of_features
> TEXT. Comma-separated string of column names or expressions to put into 
> feature array. Can also be a '*' implying all columns are to be put into 
> feature array (except for the ones included in the next argument that lists 
> exclusions). Array columns in the source table are not supported in the 
> 'list_of_features'.
> PostgreSQL arrays only allow elements of the same type.  If multiple numeric 
> types are present in the 'list_of_features', they will be cast to the largest 
> type.  For example, if there are INTEGER and DOUBLE PRECISION columns in the 
> feature list, the feature array will be of type DOUBLE PRECISION[].  Invalid 
> combinations like TEXT and INTEGER will result in an error.
> list_of_features_to_exclude (optional)
> TEXT, default NULL. Comma-separated string of column names to exclude from 
> the feature array.  Use only when 'list_of_features' is '*'.
> cols_to_output (optional)
> TEXT, default NULL. Comma-separated string of column names from the source 
> table to keep in the output table, in addition to the feature array.  To keep 
> all columns from the source table, use '*'.
> Output
> The output table produced by the cols2vec function contains the following 
> columns:
> <...>
> Columns from source table, depending on which ones are kept (if any).
> feature_vector
> Array of features.  Array type will depend on feature type in the source 
> table.
> A summary table named <out_table>_summary is also created at the same time, 
> which lists the names of the features.
> feature_names
> TEXT[] Array of names of features.
> {code}
> Notes
> (1)
> The function
> http://pivotalsoftware.github.io/PDLTools/group__grp__array__utilities.html#cols2vec_example
> is similar but the proposed MADlib one has more options.  To do the 
> equivalent of the PDL Tools one in MADlib, you would do:
> {code}
> cols2vec(
>     table_name,
>     output_table,
>     '*',
>     exclude_columns,
>     '*',
>     )
> {code}
> (2)
> Please put the feature vector on the right side of the output table, i.e., it 
> will be the last column on the right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to