[
https://issues.apache.org/jira/browse/MADLIB-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460030#comment-16460030
]
Nikhil commented on MADLIB-1128:
--------------------------------
To support grouping we have to change the query that deconstructs the matrix to
get the lower triangle.
{code}
SELECT * FROM
{schema_madlib}.__deconstruct_lower_triangle(
(SELECT {cor_mat} FROM {temp_output_table})
) AS deconstructed(column_position integer, {variable_list_str})
{code}
There are 2 possibilities
1. Instead of passing each row to the __deconstruct_lower_triangle function, we
can pass the entire column and use the group by clause. This syntax is not
allowed in postgres/gpdb
{code}
SELECT * FROM
{schema_madlib}.__deconstruct_lower_triangle(
(SELECT {cor_mat}) AS deconstructed(column_position integer,
{variable_list_str})
) FROM {temp_output_table})
{code}
Two possibilities to work around this
a.) postgres 9.3 introduced LATERAL joins which solves the problem for us but
it won't work with gpdb 4.3/5.
{code}
SELECT * FROM
{temp_output_table},
LATERAL (SELECT * FROM
{schema_madlib}.__deconstruct_lower_triangle((cor_mat)) AS
deconstructed(column_position integer, {variable_list_str})) q
{code}
b.) Create our own type and change the __deconstruct_lower_triangle function to
return this custom type instead of the record type. The problem here is that we
don't know the type in advance which means we will need to create the function
in runtime which is not something we want to do.
2. The other solution is to do a union all for all the groups, so the query
will look like
{code}
SELECT *, {select_grouping_cols}
FROM {schema_madlib}.__deconstruct_lower_triangle(
(SELECT {cor_mat} FROM {temp_output_table} where gr = 1)
) AS deconstructed(column_position integer, {variable_list_str})
UNION ALL
SELECT *, {select_grouping_cols}
FROM {schema_madlib}.__deconstruct_lower_triangle(
(SELECT {cor_mat} FROM {temp_output_table} where gr = 2)
) AS deconstructed(column_position integer, {variable_list_str})
{code}
We have decided to go with the second solution
> Add GROUP BY to correlation functions
> -------------------------------------
>
> Key: MADLIB-1128
> URL: https://issues.apache.org/jira/browse/MADLIB-1128
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Descriptive Statistics
> Reporter: Frank McQuillan
> Assignee: Nandish Jayaram
> Priority: Minor
> Fix For: v1.15
>
>
> http://madlib.incubator.apache.org/docs/latest/group__grp__correlation.html
> add:
> grouping_cols (optional)
> TEXT, default: null. A comma-separated list of columns on which to group
> results. If NULL, correlations/covariance is produced on the complete table.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)