[jira] [Created] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
Frank McQuillan created MADLIB-1086: --- Summary: Unnest 2-D array by one level (i.e. into rows of 1-D arrays) Key: MADLIB-1086 URL: https://issues.apache.org/jira/browse/MADLIB-1086 Project: Apache MADlib Issue Type: New Feature Components: Module: Utilities Reporter: Frank McQuillan Fix For: v1.11 Context Currently k-means returns the following {code} centroids| {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667, 3.318333,1020.833}, {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}} cluster_variance | {122999.110416013,30561.74805} objective_fn | 153560.858466013 frac_reassigned | 0 num_iterations | 3 {code} Story As a data scientist, I want to unnest 2-D array by one level (i.e. into rows of 1-D arrays) in K-means, so that I can get one centroid per row for follow on operations. Acceptance 1) Add function to array operations http://madlib.incubator.apache.org/docs/latest/group__grp__array.html 2) Add an example in k-means http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html to demonstrate usage -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
[ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank McQuillan updated MADLIB-1086: Priority: Minor (was: Major) > Unnest 2-D array by one level (i.e. into rows of 1-D arrays) > > > Key: MADLIB-1086 > URL: https://issues.apache.org/jira/browse/MADLIB-1086 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities >Reporter: Frank McQuillan >Priority: Minor > Fix For: v1.11 > > > Context > Currently k-means returns the following > {code} > centroids| > {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667, > 3.318333,1020.833}, > > {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}} > cluster_variance | {122999.110416013,30561.74805} > objective_fn | 153560.858466013 > frac_reassigned | 0 > num_iterations | 3 > {code} > Story > As a data scientist, I want to unnest 2-D array by one level (i.e. into rows > of 1-D arrays) in K-means, so that I can get one centroid per row for follow > on operations. > Acceptance > 1) Add function to array operations > http://madlib.incubator.apache.org/docs/latest/group__grp__array.html > 2) Add an example in k-means > http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html > to demonstrate usage -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (MADLIB-1082) Graph - add grouping to page rank
[ https://issues.apache.org/jira/browse/MADLIB-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nandish Jayaram reassigned MADLIB-1082: --- Assignee: Nandish Jayaram > Graph - add grouping to page rank > - > > Key: MADLIB-1082 > URL: https://issues.apache.org/jira/browse/MADLIB-1082 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Graph >Reporter: Frank McQuillan >Assignee: Nandish Jayaram >Priority: Minor > Fix For: v1.11 > > > Add grouping column to edge table to support separate page rank calculations > by group -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1066) Pivoting - support array and svec output
[ https://issues.apache.org/jira/browse/MADLIB-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951426#comment-15951426 ] ASF GitHub Bot commented on MADLIB-1066: Github user asfgit closed the pull request at: https://github.com/apache/incubator-madlib/pull/108 > Pivoting - support array and svec output > > > Key: MADLIB-1066 > URL: https://issues.apache.org/jira/browse/MADLIB-1066 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities >Reporter: Frank McQuillan >Priority: Minor > Fix For: v1.11 > > > Background > Follow on to these JIRAs > https://issues.apache.org/jira/browse/MADLIB-908 > https://issues.apache.org/jira/browse/MADLIB-1004 > this capability is to carry over some good ideas from > https://issues.apache.org/jira/browse/MADLIB-1038 > Story > Support array output format to allow > 1600 output columns (or PostgreSQL > limit). i.e., many MADlib algos take array input so pivot should support > array output. Base this on how it is done in encoding categorical variables > http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html > Add 'output_type' to interface: > {code} > pivot( > source_table, > output_table, > index, > pivot_cols, > pivot_values, > aggregate_func, > fill_value, > keep_null, > output_col_dictionary, > output_type -- New > ) > {code} > where > {code} > output_type (optional) > VARCHAR. default: 'column'. This parameter controls the output format. If > 'column', a column is created for each output variable. PostgreSQL limits the > number of columns in a table. If the total number of columns exceeds the > limit, then make this parameter either 'array' to combine the indicator > columns into an array or 'svec' to cast the array output to 'madlib.svec' > type. > Since the array output for any single tuple would be sparse, the 'svec' > output would be most efficient for storage. The 'array' output is useful if > the array is used for post-processing, including concatenating with other > non-categorical features. > A dictionary will be created when 'output_type' is 'array' or 'svec' to > define an index into the array. The dictionary table will be given the name > of the 'output_table' appended by '_dictionary'. > {code} > See code in > http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html > need to support NULL (=default 'column'). Also 'a' and 'Array' and 'arr' > should be interpreted as 'array. Same idea with 'column' and 'svec' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (MADLIB-1066) Pivoting - support array and svec output
[ https://issues.apache.org/jira/browse/MADLIB-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank McQuillan resolved MADLIB-1066. - Resolution: Fixed > Pivoting - support array and svec output > > > Key: MADLIB-1066 > URL: https://issues.apache.org/jira/browse/MADLIB-1066 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Utilities >Reporter: Frank McQuillan >Priority: Minor > Fix For: v1.11 > > > Background > Follow on to these JIRAs > https://issues.apache.org/jira/browse/MADLIB-908 > https://issues.apache.org/jira/browse/MADLIB-1004 > this capability is to carry over some good ideas from > https://issues.apache.org/jira/browse/MADLIB-1038 > Story > Support array output format to allow > 1600 output columns (or PostgreSQL > limit). i.e., many MADlib algos take array input so pivot should support > array output. Base this on how it is done in encoding categorical variables > http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html > Add 'output_type' to interface: > {code} > pivot( > source_table, > output_table, > index, > pivot_cols, > pivot_values, > aggregate_func, > fill_value, > keep_null, > output_col_dictionary, > output_type -- New > ) > {code} > where > {code} > output_type (optional) > VARCHAR. default: 'column'. This parameter controls the output format. If > 'column', a column is created for each output variable. PostgreSQL limits the > number of columns in a table. If the total number of columns exceeds the > limit, then make this parameter either 'array' to combine the indicator > columns into an array or 'svec' to cast the array output to 'madlib.svec' > type. > Since the array output for any single tuple would be sparse, the 'svec' > output would be most efficient for storage. The 'array' output is useful if > the array is used for post-processing, including concatenating with other > non-categorical features. > A dictionary will be created when 'output_type' is 'array' or 'svec' to > define an index into the array. The dictionary table will be given the name > of the 'output_table' appended by '_dictionary'. > {code} > See code in > http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html > need to support NULL (=default 'column'). Also 'a' and 'Array' and 'arr' > should be interpreted as 'array. Same idea with 'column' and 'svec' -- This message was sent by Atlassian JIRA (v6.3.15#6346)