[ https://issues.apache.org/jira/browse/MADLIB-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540550#comment-16540550 ]
ASF GitHub Bot commented on MADLIB-1240: ---------------------------------------- GitHub user ArvindSridhar opened a pull request: https://github.com/apache/madlib/pull/291 Feature: Vector to Columns JIRA: MADLIB-1240 The vec2cols function enables users to split up a single column into multiple columns, given that the input column contains array entries. For example, if the input column contained ARRAY[1, 2, 3] in one of its rows, the output table will contain 3 different columns, one for each element of the array. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/vector-to-columns Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #291 ---- commit 3237acb45c553fc4fc2c20b6e7c9a0b6bec2ffe8 Author: Arvind Sridhar <asridhar@...> Date: 2018-07-06T00:18:55Z Utilities: Create new function to convert vector to columns JIRA: MADLIB-1240 The vec2cols function enables users to split up a single column into multiple columns, given that the input column contains array entries. For example, if the input column contained ARRAY[1, 2, 3] in one of its rows, the output table will contain 3 different columns, one for each element of the array. Co-authored-by: Nikhil Kak <n...@pivotal.io> Co-authored-by: Nandish Jayaram <njaya...@apache.org> commit 0100da245555333fda01fe2b0428d80da2ba5ab9 Author: Arvind Sridhar <arvindsridhar@...> Date: 2018-07-09T23:12:28Z Internal: Add function to check column type for 1D array Co-authored-by: Nikhil Kak <n...@pivotal.io> commit 1e8bc328824ea57a0d253834f36e7ca4b0eff26a Author: Arvind Sridhar <arvindsridhar@...> Date: 2018-07-09T23:14:48Z Utilities: Add check for whether type is of any array variant Co-authored-by: Nikhil Kak <n...@pivotal.io> ---- > Vector to Columns > ----------------- > > Key: MADLIB-1240 > URL: https://issues.apache.org/jira/browse/MADLIB-1240 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities > Reporter: Frank McQuillan > Assignee: Nandish Jayaram > Priority: Major > Fix For: v1.15 > > > related to https://issues.apache.org/jira/browse/MADLIB-1239 > Vector to Columns > Converts a feature array in a single column of an output table into multiple > columns. This process can be used to reverse the function cols2vec. > {code} > vec2cols( > source_table, > out_table, > vector_col, > feature_names, > cols_to_output > ) > source_table > TEXT. Name of the table containing the source data. > out_table > TEXT. Name of the generated table containing the output. If a table with the > same name already exists, an error will be returned. > vector_col > TEXT. Name of the column containing the feature array. Must be a > one-dimensional array. > feature_names (optional) > TEXT[]. Array of names associated with the feature array. Note that this > array exists in the summary table created by the function 'cols2vec'. If the > feature_names array is not specified, column names will be automatically > generated of the form 'f1, f2, ...fn' > cols_to_output (optional) > TEXT, default NULL. Comma-separated string of column names from the source > table to keep in the output table, in addition to the feature columns. To > keep all columns from the source table, use '*'. > Output > The output table produced by the vec2cols function contains the following > columns: > <...> > Columns from source table, depending on which ones are kept (if any). > feature columns > Columns for each of the features in 'vector_col'. Column type will depend on > the feature array type in the source table. Column naming will depend on > whether the parameter 'feature_names' is used. > {code} > Notes > (1) > The function > http://pivotalsoftware.github.io/PDLTools/group__ArrayUtilities.html > is similar but the proposed MADlib one has more options. To do the > equivalent of the PDL Tools one in MADlib, you would do: > {code} > vec2cols( > table_name, > output_table, > vector_column, > NULL, > '*' > ) > {code} > (2) > Please put the generated feature columns on the right side of the output > table, i.e., they will be the last column on the right. Maintain the order > of the array. > Examples of feature name usage > {code} > select vec2cols( > source_table, > out_table, > vector_col, > SELECT col_names FROM a_table, -- feature name array exists in table > 'a_table' > cols_to_output > ) > {code} > OR > {code} > select vec2cols( > source_table, > out_table, > vector_col, > dictionary, > cols_to_output > ) from (select col_names as feature_names from a_table) q -- feature names > array exists in table 'a_table' > {code} > OR > {code} > select vec2cols( > source_table, > out_table, > vector_col, > {'n1', 'n2'... 'nn'}, -- user explicitly enters feature names > cols_to_output > ) > {code} > OR > {code} > select vec2cols( > source_table, > out_table, > vector_col, > NULL, -- no dictionary exists, will auto-generate column names as f1, f2, ... > cols_to_output > ) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)