[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-04-27 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987818#comment-15987818
 ] 

Frank McQuillan commented on MADLIB-1086:
-

{code/sql}
DROP TABLE IF EXISTS arraytest1;

CREATE TABLE arraytest1(
id INTEGER,
arrays1 INTEGER[][]  
);

INSERT INTO arraytest1 VALUES
(1, '{{1,2},{3,4}}'),
(2, '{{5,6},{7,8}}'),
(3, '{{9,10},{11,12}}');

DROP TABLE IF EXISTS array_unnest_output;
CREATE TABLE array_unnest_output AS
SELECT *, (madlib.array_unnest_2d_to_1d(arrays1)).*
FROM arraytest1;
SELECT * FROM array_unnest_output ORDER BY id, unnest_row_id;
{code}
produces
{code/sql}
 id | arrays1  | unnest_row_id | unnest_result 
+--+---+---
  1 | {{1,2},{3,4}}| 1 | {1,2}
  1 | {{1,2},{3,4}}| 2 | {3,4}
  2 | {{5,6},{7,8}}| 1 | {5,6}
  2 | {{5,6},{7,8}}| 2 | {7,8}
  3 | {{9,10},{11,12}} | 1 | {9,10}
  3 | {{9,10},{11,12}} | 2 | {11,12}
(6 rows)
{code}
which seems fine.  FLOAT8 and TEXT work fine too.

I also updated the K-means workbook to use this new unnest function and posted 
to 
https://github.com/apache/incubator-madlib-site/blob/asf-site/community-artifacts/Kmeans-v2.ipynb




> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> 
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Assignee: Rashmi Raghu
>Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids| 
> {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
>  3.318333,1020.833},
>
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows 
> of 1-D arrays) in K-means, so that I can get one centroid per row for follow 
> on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-04-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985339#comment-15985339
 ] 

ASF GitHub Bot commented on MADLIB-1086:


Github user rashmi815 closed the pull request at:

https://github.com/apache/incubator-madlib/pull/116


> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> 
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Assignee: Rashmi Raghu
>Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids| 
> {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
>  3.318333,1020.833},
>
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows 
> of 1-D arrays) in K-means, so that I can get one centroid per row for follow 
> on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971914#comment-15971914
 ] 

ASF GitHub Bot commented on MADLIB-1086:


GitHub user rashmi815 opened a pull request:

https://github.com/apache/incubator-madlib/pull/116

Unnest 2d array

Array Operations: Add function to unnest 2-D arrays into rows of 1-D arrays

JIRA:  MADLIB-1086

Function to unnest 2-D array by one level (i.e. into rows of 1-D arrays).
This is needed, for instance, in K-means, so that we can get one centroid 
per row for follow on operations.
- Added function to array operations
- Added an example in k-means to demonstrate usage

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rashmi815/incubator-madlib unnest_2d_array

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #116


commit 18e562813702d12d620594598f471161a990fbbd
Author: Rashmi Raghu 
Date:   2017-04-15T00:08:17Z

Unnest function, install-check tests completed. Initial docs included

commit 2a4baffa29c8f976d3260931c1790cfc125e91f4
Author: Rashmi Raghu 
Date:   2017-04-15T06:20:01Z

Refactored names of function output columns

commit a3eae964adc84382fa674e4d95c486f472b14099
Author: Rashmi Raghu 
Date:   2017-04-17T23:45:32Z

Updated docs (array_ops and k-means) and minor update to install-check tests




> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> 
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Assignee: Rashmi Raghu
>Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids| 
> {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
>  3.318333,1020.833},
>
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows 
> of 1-D arrays) in K-means, so that I can get one centroid per row for follow 
> on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)

2017-04-06 Thread Rashmi Raghu (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960062#comment-15960062
 ] 

Rashmi Raghu commented on MADLIB-1086:
--

The proposed approach for unnesting 2D arrays into a set of 1D array is a SQL 
function is similar to one of the approaches here 
[http://stackoverflow.com/questions/8137112/unnest-array-by-one-level/8142998#8142998]
 : 
{code:sql}
CREATE OR REPLACE FUNCTION unnest_multidim(anyarray)
RETURNS SETOF anyarray AS
$BODY$
  SELECT array_agg($1[series2.i][series2.x]) FROM
(SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i
 FROM 
 (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1 
) series2
GROUP BY series2.i
$BODY$
LANGUAGE sql IMMUTABLE;
{code}

However, this does not preserve element ordering within the resulting 1D arrays 
in Greenplum. E.g.:
{code:sql}
SELECT unnest_multidim(val) FROM (SELECT ARRAY[[1,2],[3,4],[5,6]] AS val) t;
{code}
{code:sql}
 unnest_multidim 
-
 {2,1}
 {4,3}
 {5,6}
(3 rows)
{code}

Including 'order by' within the array_agg function call fixes this issue:
{code:sql}
array_agg($1[series2.i][series2.x] ORDER BY series2.x
{code}

One other aspect that needs to be added is an ID / bookkeeping column to note 
which position in the 2D array each 1D array corresponds to.

-- Rashmi

> Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
> 
>
> Key: MADLIB-1086
> URL: https://issues.apache.org/jira/browse/MADLIB-1086
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Utilities
>Reporter: Frank McQuillan
>Assignee: Rashmi Raghu
>Priority: Minor
> Fix For: v1.11
>
>
> Context
> Currently k-means returns the following
> {code}
> centroids| 
> {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667,
>  3.318333,1020.833},
>
> {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}}
> cluster_variance | {122999.110416013,30561.74805}
> objective_fn | 153560.858466013
> frac_reassigned  | 0
> num_iterations   | 3
> {code}
> Story
> As a data scientist, I want to unnest 2-D array by one level (i.e. into rows 
> of 1-D arrays) in K-means, so that I can get one centroid per row for follow 
> on operations.
> Acceptance
> 1) Add function to array operations
> http://madlib.incubator.apache.org/docs/latest/group__grp__array.html
> 2) Add an example in k-means
>  http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html
> to demonstrate usage



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)