[
https://issues.apache.org/jira/browse/MADLIB-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nandish Jayaram closed MADLIB-1367.
-----------------------------------
> WCC: Improve performance with grouping
> --------------------------------------
>
> Key: MADLIB-1367
> URL: https://issues.apache.org/jira/browse/MADLIB-1367
> Project: Apache MADlib
> Issue Type: Bug
> Reporter: Nikhil
> Assignee: Nandish Jayaram
> Priority: Major
> Fix For: v1.17
>
>
> As seen in thisĀ [JIRA|https://issues.apache.org/jira/browse/MADLIB-1320]
> {{distributed by}} on multiple columns caused slowness of the query as GPDB
> redistributes data. We had not addressed similar issue in case of grouping as
> part of the previous story.
> {{newupdate}} and {{message}} tables are distributed on {{grouping_cols}} and
> {{vertex_id}}. This has to be changed since our original assumption was that
> data would be distributed by grouping cols first, followed by vertex_id. But
> instead, the distribution in this case happens over the array of the values
> of the keys.
> Acceptance:
> 1. Perf test with grouping to repro the performance issue with grouping.
> 2. Fix possible perf issue with grouping.
> 3. We may have similar issues in HITS and Pagerank, create follow-on JIRAs
> for the same.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)