[jira] [Created] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-03-31 Thread Frank McQuillan (JIRA)
Frank McQuillan created MADLIB-1086:
---

 Summary: Unnest 2-D array by one level (i.e. into rows of 1-D 
arrays)
 Key: MADLIB-1086
 URL: https://issues.apache.org/jira/browse/MADLIB-1086
 Project: Apache MADlib
  Issue Type: New Feature
  Components: Module: Utilities
Reporter: Frank McQuillan
 Fix For: v1.11


Context

Currently k-means returns the following
{code}
centroids| 
{{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
 3.318333,1020.833},
   
{14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
cluster_variance | {122999.110416013,30561.74805}
objective_fn | 153560.858466013
frac_reassigned  | 0
num_iterations   | 3
{code}

Story

As a data scientist, I want to unnest 2-D array by one level (i.e. into rows of 
1-D arrays) in K-means, so that I can get one centroid per row for follow on 
operations.

Acceptance

1) Add function to array operations
http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
2) Add an example in k-means
 http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
to demonstrate usage





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-03-31 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1086:

Priority: Minor  (was: Major)

> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> 
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids| 
> {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
>  3.318333,1020.833},
>
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows 
> of 1-D arrays) in K-means, so that I can get one centroid per row for follow 
> on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MADLIB-1082) Graph - add grouping to page rank

2017-03-31 Thread Nandish Jayaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandish Jayaram reassigned MADLIB-1082:
---

Assignee: Nandish Jayaram

> Graph - add grouping to page rank
> -
>
> Key: MADLIB-1082
> URL: https://issues.apache.org/jira/browse/MADLIB-1082
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Graph
>Reporter: Frank McQuillan
>Assignee: Nandish Jayaram
>Priority: Minor
> Fix For: v1.11
>
>
> Add grouping column to edge table to support separate page rank calculations 
> by group



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1066) Pivoting - support array and svec output

2017-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951426#comment-15951426
 ] 

ASF GitHub Bot commented on MADLIB-1066:


Github user asfgit closed the pull request at:

https://github.com/apache/incubator-madlib/pull/108


> Pivoting - support array and svec output
> 
>
> Key: MADLIB-1066
> URL: https://issues.apache.org/jira/browse/MADLIB-1066
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Priority: Minor
> Fix For: v1.11
>
>
> Background
> Follow on to these JIRAs
> https://issues.apache.org/jira/browse/MADLIB-908
> https://issues.apache.org/jira/browse/MADLIB-1004
> this capability is to carry over some good ideas from
> https://issues.apache.org/jira/browse/MADLIB-1038
> Story
> Support array output format to allow > 1600 output columns (or PostgreSQL 
> limit).  i.e., many MADlib algos take array input so pivot should support 
> array output.  Base this on how it is done in encoding categorical variables 
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> Add 'output_type' to interface:
> {code}
> pivot(
> source_table,
> output_table,
> index,
> pivot_cols,
> pivot_values,
> aggregate_func,
> fill_value,
> keep_null,
> output_col_dictionary,
> output_type  -- New
> )
> {code}
> where
> {code}
> output_type (optional)
> VARCHAR. default: 'column'. This parameter controls the output format.  If 
> 'column', a column is created for each output variable. PostgreSQL limits the 
> number of columns in a table. If the total number of columns exceeds the 
> limit, then make this parameter either 'array' to combine the indicator 
> columns into an array or 'svec' to cast the array output to 'madlib.svec' 
> type.
> Since the array output for any single tuple would be sparse, the 'svec' 
> output would be most efficient for storage. The 'array' output is useful if 
> the array is used for post-processing, including concatenating with other 
> non-categorical features.
> A dictionary will be created when 'output_type' is 'array' or 'svec' to 
> define an index into the array. The dictionary table will be given the name 
> of the 'output_table' appended by '_dictionary'.
> {code}
> See code in
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> need to support NULL (=default 'column').  Also 'a' and 'Array' and 'arr' 
> should be interpreted as 'array.  Same idea with 'column' and 'svec'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (MADLIB-1066) Pivoting - support array and svec output

2017-03-31 Thread Frank McQuillan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MADLIB-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan resolved MADLIB-1066.
-
Resolution: Fixed

> Pivoting - support array and svec output
> 
>
> Key: MADLIB-1066
> URL: https://issues.apache.org/jira/browse/MADLIB-1066
> Project: Apache MADlib
>  Issue Type: Improvement
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Priority: Minor
> Fix For: v1.11
>
>
> Background
> Follow on to these JIRAs
> https://issues.apache.org/jira/browse/MADLIB-908
> https://issues.apache.org/jira/browse/MADLIB-1004
> this capability is to carry over some good ideas from
> https://issues.apache.org/jira/browse/MADLIB-1038
> Story
> Support array output format to allow > 1600 output columns (or PostgreSQL 
> limit).  i.e., many MADlib algos take array input so pivot should support 
> array output.  Base this on how it is done in encoding categorical variables 
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> Add 'output_type' to interface:
> {code}
> pivot(
> source_table,
> output_table,
> index,
> pivot_cols,
> pivot_values,
> aggregate_func,
> fill_value,
> keep_null,
> output_col_dictionary,
> output_type  -- New
> )
> {code}
> where
> {code}
> output_type (optional)
> VARCHAR. default: 'column'. This parameter controls the output format.  If 
> 'column', a column is created for each output variable. PostgreSQL limits the 
> number of columns in a table. If the total number of columns exceeds the 
> limit, then make this parameter either 'array' to combine the indicator 
> columns into an array or 'svec' to cast the array output to 'madlib.svec' 
> type.
> Since the array output for any single tuple would be sparse, the 'svec' 
> output would be most efficient for storage. The 'array' output is useful if 
> the array is used for post-processing, including concatenating with other 
> non-categorical features.
> A dictionary will be created when 'output_type' is 'array' or 'svec' to 
> define an index into the array. The dictionary table will be given the name 
> of the 'output_table' appended by '_dictionary'.
> {code}
> See code in
> http://madlib.incubator.apache.org/docs/latest/group__grp__encode__categorical.html
> need to support NULL (=default 'column').  Also 'a' and 'Array' and 'arr' 
> should be interpreted as 'array.  Same idea with 'column' and 'svec'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)