[ 
https://issues.apache.org/jira/browse/MADLIB-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1240:
------------------------------------
    Description: 
related to https://issues.apache.org/jira/browse/MADLIB-1239

Vector to Columns

Converts a feature array in a single column of an output table into multiple 
columns.  This process can be used to reverse the function cols2vec.

{code}
vec2cols(
    source_table,
    out_table,
    vector_col,
    dictionary,
    cols_to_output
    )

source_table
TEXT. Name of the table containing the source data.

out_table
TEXT. Name of the generated table containing the output. If a table with the 
same name already exists, an error will be returned. 

vector_col
TEXT.  Name of the column containing the feature array.  Must be a 
one-dimensional array.

dictionary (optional)
TEXT[]. Array of names associated with the feature array.  Note that this array 
exists in the summary table created by the function 'cols2vec'.  If the 
dictionary array is not specified, column names will be automatically generated 
of the form 'f1, f2, ...fn'

cols_to_output (optional)
TEXT, default NULL. Comma-separated string of column names from the source 
table to keep in the output table, in addition to the feature columns.  To keep 
all columns from the source table, use '*'.


Output

The output table produced by the vec2cols function contains the following 
columns:

<...>
Columns from source table, depending on which ones are kept (if any).

feature columns
Columns for each of the features in 'vector_col'.  Column type will depend on 
the feature array type in the source table.  Column naming will depend on 
whether the parameter 'dictionary' is used.
{code}


Notes

(1)
The function
http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html
is similar but the proposed MADlib one has more options.  To do the equivalent 
of the PDL Tools one in MADlib, you would do:

{code}
vec2cols(
    table_name,
    output_table,
    vector_column,
    NULL,
    '*'
    )
{code}

(2)
Please put the generated feature columns on the right side of the output table, 
i.e., they will be the last column on the right.  Maintain the order of the 
array.


Examples of dictionary usage

select vec2cols(
source_table,
out_table,
vector_col,
SELECT col_names FROM a_table,  -- dictionary array exists in table 'a_table'
cols_to_output
)

OR 

select vec2cols(
source_table,
out_table,
vector_col,
dictionary,
cols_to_output
) from (select col_names as dictionary from a_table) q -- dictionary array 
exists in table 'a_table'

OR

select vec2cols(
source_table,
out_table,
vector_col,
{'n1', 'n2'... 'nn'}, -- user explicitly enters dictionary 
cols_to_output
)

OR

select vec2cols(
source_table,
out_table,
vector_col,
NULL, -- no dictionary exists, will auto-generate column names as f1, f2, ...
cols_to_output
)




  was:
related to https://issues.apache.org/jira/browse/MADLIB-1239

Vector to Columns

Converts a feature array in a single column of an output table into multiple 
columns.  This process can be used to reverse the function cols2vec.

{code}
vec2cols(
    source_table,
    out_table,
    vector_col,
    dictionary,
    cols_to_output
    )

source_table
TEXT. Name of the table containing the source data.

out_table
TEXT. Name of the generated table containing the output. If a table with the 
same name already exists, an error will be returned. 

vector_col
TEXT.  Name of the column containing the feature array.  Must be a 
one-dimensional array.

dictionary (optional)
TEXT. Name of the table containing the array of names associated with the 
feature array.  This table is created by the function 'cols2vec'.  If the 
dictionary table is not specified, column names will be automatically generated 
of the form 'f1, f2, ...fn'

cols_to_output (optional)
TEXT, default NULL. Comma-separated string of column names from the source 
table to keep in the output table, in addition to the feature columns.  To keep 
all columns from the source table, use '*'.


Output

The output table produced by the vec2cols function contains the following 
columns:

<...>
Columns from source table, depending on which ones are kept (if any).

feature columns
Columns for each of the features in 'vector_col'.  Column type will depend on 
the feature array type in the source table.  Column naming will depend on 
whether the parameter 'dictionary' is used.
{code}

Notes

(1)
The function
http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html
is similar but the proposed MADlib one has more options.  To do the equivalent 
of the PDL Tools one in MADlib, you would do:

{code}
vec2cols(
    table_name,
    output_table,
    vector_column,
    NULL,
    '*'
    )
{code}

(2)
Please put the generated feature columns on the right side of the output table, 
i.e., they will be the last column on the right.  Maintain the order of the 
array.


> Vector to Columns
> -----------------
>
>                 Key: MADLIB-1240
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1240
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Assignee: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.15
>
>
> related to https://issues.apache.org/jira/browse/MADLIB-1239
> Vector to Columns
> Converts a feature array in a single column of an output table into multiple 
> columns.  This process can be used to reverse the function cols2vec.
> {code}
> vec2cols(
>     source_table,
>     out_table,
>     vector_col,
>     dictionary,
>     cols_to_output
>     )
> source_table
> TEXT. Name of the table containing the source data.
> out_table
> TEXT. Name of the generated table containing the output. If a table with the 
> same name already exists, an error will be returned. 
> vector_col
> TEXT.  Name of the column containing the feature array.  Must be a 
> one-dimensional array.
> dictionary (optional)
> TEXT[]. Array of names associated with the feature array.  Note that this 
> array exists in the summary table created by the function 'cols2vec'.  If the 
> dictionary array is not specified, column names will be automatically 
> generated of the form 'f1, f2, ...fn'
> cols_to_output (optional)
> TEXT, default NULL. Comma-separated string of column names from the source 
> table to keep in the output table, in addition to the feature columns.  To 
> keep all columns from the source table, use '*'.
> Output
> The output table produced by the vec2cols function contains the following 
> columns:
> <...>
> Columns from source table, depending on which ones are kept (if any).
> feature columns
> Columns for each of the features in 'vector_col'.  Column type will depend on 
> the feature array type in the source table.  Column naming will depend on 
> whether the parameter 'dictionary' is used.
> {code}
> Notes
> (1)
> The function
> http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html
> is similar but the proposed MADlib one has more options.  To do the 
> equivalent of the PDL Tools one in MADlib, you would do:
> {code}
> vec2cols(
>     table_name,
>     output_table,
>     vector_column,
>     NULL,
>     '*'
>     )
> {code}
> (2)
> Please put the generated feature columns on the right side of the output 
> table, i.e., they will be the last column on the right.  Maintain the order 
> of the array.
> Examples of dictionary usage
> select vec2cols(
> source_table,
> out_table,
> vector_col,
> SELECT col_names FROM a_table,  -- dictionary array exists in table 'a_table'
> cols_to_output
> )
> OR 
> select vec2cols(
> source_table,
> out_table,
> vector_col,
> dictionary,
> cols_to_output
> ) from (select col_names as dictionary from a_table) q -- dictionary array 
> exists in table 'a_table'
> OR
> select vec2cols(
> source_table,
> out_table,
> vector_col,
> {'n1', 'n2'... 'nn'}, -- user explicitly enters dictionary 
> cols_to_output
> )
> OR
> select vec2cols(
> source_table,
> out_table,
> vector_col,
> NULL, -- no dictionary exists, will auto-generate column names as f1, f2, ...
> cols_to_output
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to