[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
[ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987818#comment-15987818 ] Frank McQuillan commented on MADLIB-1086: - {code/sql} DROP TABLE IF EXISTS arraytest1; CREATE TABLE arraytest1( id INTEGER, arrays1 INTEGER[][] ); INSERT INTO arraytest1 VALUES (1, '{{1,2},{3,4}}'), (2, '{{5,6},{7,8}}'), (3, '{{9,10},{11,12}}'); DROP TABLE IF EXISTS array_unnest_output; CREATE TABLE array_unnest_output AS SELECT *, (madlib.array_unnest_2d_to_1d(arrays1)).* FROM arraytest1; SELECT * FROM array_unnest_output ORDER BY id, unnest_row_id; {code} produces {code/sql} id | arrays1 | unnest_row_id | unnest_result +--+---+--- 1 | {{1,2},{3,4}}| 1 | {1,2} 1 | {{1,2},{3,4}}| 2 | {3,4} 2 | {{5,6},{7,8}}| 1 | {5,6} 2 | {{5,6},{7,8}}| 2 | {7,8} 3 | {{9,10},{11,12}} | 1 | {9,10} 3 | {{9,10},{11,12}} | 2 | {11,12} (6 rows) {code} which seems fine. FLOAT8 and TEXT work fine too. I also updated the K-means workbook to use this new unnest function and posted to https://github.com/apache/incubator-madlib-site/blob/asf-site/community-artifacts/Kmeans-v2.ipynb > Unnest 2-D array by one level (i.e. into rows of 1-D arrays) > > > Key: MADLIB-1086 > URL: https://issues.apache.org/jira/browse/MADLIB-1086 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities >Reporter: Frank McQuillan >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Context > Currently k-means returns the following > {code} > centroids| > {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667, > 3.318333,1020.833}, > > {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}} > cluster_variance | {122999.110416013,30561.74805} > objective_fn | 153560.858466013 > frac_reassigned | 0 > num_iterations | 3 > {code} > Story > As a data scientist, I want to unnest 2-D array by one level (i.e. into rows > of 1-D arrays) in K-means, so that I can get one centroid per row for follow > on operations. > Acceptance > 1) Add function to array operations > http://madlib.incubator.apache.org/docs/latest/group__grp__array.html > 2) Add an example in k-means > http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html > to demonstrate usage -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
[ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985339#comment-15985339 ] ASF GitHub Bot commented on MADLIB-1086: Github user rashmi815 closed the pull request at: https://github.com/apache/incubator-madlib/pull/116 > Unnest 2-D array by one level (i.e. into rows of 1-D arrays) > > > Key: MADLIB-1086 > URL: https://issues.apache.org/jira/browse/MADLIB-1086 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities >Reporter: Frank McQuillan >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Context > Currently k-means returns the following > {code} > centroids| > {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667, > 3.318333,1020.833}, > > {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}} > cluster_variance | {122999.110416013,30561.74805} > objective_fn | 153560.858466013 > frac_reassigned | 0 > num_iterations | 3 > {code} > Story > As a data scientist, I want to unnest 2-D array by one level (i.e. into rows > of 1-D arrays) in K-means, so that I can get one centroid per row for follow > on operations. > Acceptance > 1) Add function to array operations > http://madlib.incubator.apache.org/docs/latest/group__grp__array.html > 2) Add an example in k-means > http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html > to demonstrate usage -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
[ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971914#comment-15971914 ] ASF GitHub Bot commented on MADLIB-1086: GitHub user rashmi815 opened a pull request: https://github.com/apache/incubator-madlib/pull/116 Unnest 2d array Array Operations: Add function to unnest 2-D arrays into rows of 1-D arrays JIRA: MADLIB-1086 Function to unnest 2-D array by one level (i.e. into rows of 1-D arrays). This is needed, for instance, in K-means, so that we can get one centroid per row for follow on operations. - Added function to array operations - Added an example in k-means to demonstrate usage You can merge this pull request into a Git repository by running: $ git pull https://github.com/rashmi815/incubator-madlib unnest_2d_array Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-madlib/pull/116.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #116 commit 18e562813702d12d620594598f471161a990fbbd Author: Rashmi Raghu Date: 2017-04-15T00:08:17Z Unnest function, install-check tests completed. Initial docs included commit 2a4baffa29c8f976d3260931c1790cfc125e91f4 Author: Rashmi Raghu Date: 2017-04-15T06:20:01Z Refactored names of function output columns commit a3eae964adc84382fa674e4d95c486f472b14099 Author: Rashmi Raghu Date: 2017-04-17T23:45:32Z Updated docs (array_ops and k-means) and minor update to install-check tests > Unnest 2-D array by one level (i.e. into rows of 1-D arrays) > > > Key: MADLIB-1086 > URL: https://issues.apache.org/jira/browse/MADLIB-1086 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities >Reporter: Frank McQuillan >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Context > Currently k-means returns the following > {code} > centroids| > {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667, > 3.318333,1020.833}, > > {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}} > cluster_variance | {122999.110416013,30561.74805} > objective_fn | 153560.858466013 > frac_reassigned | 0 > num_iterations | 3 > {code} > Story > As a data scientist, I want to unnest 2-D array by one level (i.e. into rows > of 1-D arrays) in K-means, so that I can get one centroid per row for follow > on operations. > Acceptance > 1) Add function to array operations > http://madlib.incubator.apache.org/docs/latest/group__grp__array.html > 2) Add an example in k-means > http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html > to demonstrate usage -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MADLIB-1086) Unnest 2-D array by one level (i.e. into rows of 1-D arrays)
[ https://issues.apache.org/jira/browse/MADLIB-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960062#comment-15960062 ] Rashmi Raghu commented on MADLIB-1086: -- The proposed approach for unnesting 2D arrays into a set of 1D array is a SQL function is similar to one of the approaches here [http://stackoverflow.com/questions/8137112/unnest-array-by-one-level/8142998#8142998] : {code:sql} CREATE OR REPLACE FUNCTION unnest_multidim(anyarray) RETURNS SETOF anyarray AS $BODY$ SELECT array_agg($1[series2.i][series2.x]) FROM (SELECT generate_series(array_lower($1,2),array_upper($1,2)) as x, series1.i FROM (SELECT generate_series(array_lower($1,1),array_upper($1,1)) as i) series1 ) series2 GROUP BY series2.i $BODY$ LANGUAGE sql IMMUTABLE; {code} However, this does not preserve element ordering within the resulting 1D arrays in Greenplum. E.g.: {code:sql} SELECT unnest_multidim(val) FROM (SELECT ARRAY[[1,2],[3,4],[5,6]] AS val) t; {code} {code:sql} unnest_multidim - {2,1} {4,3} {5,6} (3 rows) {code} Including 'order by' within the array_agg function call fixes this issue: {code:sql} array_agg($1[series2.i][series2.x] ORDER BY series2.x {code} One other aspect that needs to be added is an ID / bookkeeping column to note which position in the 2D array each 1D array corresponds to. -- Rashmi > Unnest 2-D array by one level (i.e. into rows of 1-D arrays) > > > Key: MADLIB-1086 > URL: https://issues.apache.org/jira/browse/MADLIB-1086 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities >Reporter: Frank McQuillan >Assignee: Rashmi Raghu >Priority: Minor > Fix For: v1.11 > > > Context > Currently k-means returns the following > {code} > centroids| > {{13.75333,1.905,2.425,16.06667,90.3,2.805,2.98,0.29,2.005,5.406633,1.041667, > 3.318333,1020.833}, > > {14.255,1.9325,2.5025,16.05,110.5,3.055,2.9775,0.2975,1.845,6.2125,0.9975,3.365,1378.75}} > cluster_variance | {122999.110416013,30561.74805} > objective_fn | 153560.858466013 > frac_reassigned | 0 > num_iterations | 3 > {code} > Story > As a data scientist, I want to unnest 2-D array by one level (i.e. into rows > of 1-D arrays) in K-means, so that I can get one centroid per row for follow > on operations. > Acceptance > 1) Add function to array operations > http://madlib.incubator.apache.org/docs/latest/group__grp__array.html > 2) Add an example in k-means > http://madlib.incubator.apache.org/docs/latest/group__grp__kmeans.html > to demonstrate usage -- This message was sent by Atlassian JIRA (v6.3.15#6346)