[ https://issues.apache.org/jira/browse/MADLIB-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan closed MADLIB-1240. ----------------------------------- Resolution: Fixed > Vector to Columns > ----------------- > > Key: MADLIB-1240 > URL: https://issues.apache.org/jira/browse/MADLIB-1240 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities > Reporter: Frank McQuillan > Assignee: Nandish Jayaram > Priority: Major > Fix For: v1.15 > > > related to https://issues.apache.org/jira/browse/MADLIB-1239 > Vector to Columns > Converts a feature array in a single column of an output table into multiple > columns. This process can be used to reverse the function cols2vec. > {code} > vec2cols( > source_table, > out_table, > vector_col, > feature_names, > cols_to_output > ) > source_table > TEXT. Name of the table containing the source data. > out_table > TEXT. Name of the generated table containing the output. If a table with the > same name already exists, an error will be returned. > vector_col > TEXT. Name of the column containing the feature array. Must be a > one-dimensional array. > feature_names (optional) > TEXT[]. Array of names associated with the feature array. Note that this > array exists in the summary table created by the function 'cols2vec'. If the > feature_names array is not specified, column names will be automatically > generated of the form 'f1, f2, ...fn' > cols_to_output (optional) > TEXT, default NULL. Comma-separated string of column names from the source > table to keep in the output table, in addition to the feature columns. To > keep all columns from the source table, use '*'. > Output > The output table produced by the vec2cols function contains the following > columns: > <...> > Columns from source table, depending on which ones are kept (if any). > feature columns > Columns for each of the features in 'vector_col'. Column type will depend on > the feature array type in the source table. Column naming will depend on > whether the parameter 'feature_names' is used. > {code} > Notes > (1) > The function > http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html > is similar but the proposed MADlib one has more options. To do the > equivalent of the PDL Tools one in MADlib, you would do: > {code} > vec2cols( > table_name, > output_table, > vector_column, > NULL, > '*' > ) > {code} > (2) > Please put the generated feature columns on the right side of the output > table, i.e., they will be the last column on the right. Maintain the order > of the array. > Examples of feature name usage > {code} > select vec2cols( > source_table, > out_table, > vector_col, > SELECT col_names FROM a_table, -- feature name array exists in table > 'a_table' > cols_to_output > ) > {code} > OR > {code} > select vec2cols( > source_table, > out_table, > vector_col, > dictionary, > cols_to_output > ) from (select col_names as feature_names from a_table) q -- feature names > array exists in table 'a_table' > {code} > OR > {code} > select vec2cols( > source_table, > out_table, > vector_col, > {'n1', 'n2'... 'nn'}, -- user explicitly enters feature names > cols_to_output > ) > {code} > OR > {code} > select vec2cols( > source_table, > out_table, > vector_col, > NULL, -- no dictionary exists, will auto-generate column names as f1, f2, ... > cols_to_output > ) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)