[ https://issues.apache.org/jira/browse/MADLIB-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547145#comment-16547145 ]
Frank McQuillan commented on MADLIB-1229: ----------------------------------------- Output looks OK, dup is gone: {code} user_id | id | pagerank } ---------+----+---------- 1 | 0 | 0.05 1 | 1 | 0.475 1 | 2 | 0.475 2 | 0 | 0.075 2 | 1 | 0.13875 (5 rows) {code} LGTM > Duplicated result in PageRank output table with grouping > -------------------------------------------------------- > > Key: MADLIB-1229 > URL: https://issues.apache.org/jira/browse/MADLIB-1229 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Graph > Reporter: Jingyi Mei > Assignee: Himanshu Pandey > Priority: Minor > Fix For: v1.15 > > > In madlib 1.13, if I run the follow query > {code:java} > DROP TABLE IF EXISTS vertex, "EDGE"; > CREATE TABLE vertex( > id INTEGER > ); > CREATE TABLE "EDGE"( > src INTEGER, > dest INTEGER, > user_id INTEGER > ); > INSERT INTO vertex VALUES > (0), > (1), > (2); > INSERT INTO "EDGE" VALUES > (0, 1, 1), > (0, 2, 1), > (1, 2, 1), > (2, 1, 1), > (0, 1, 2); > DROP TABLE IF EXISTS pagerank_ppr_grp_out; > DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary; > SELECT pagerank( > 'vertex', -- Vertex table > 'id', -- Vertix id column > '"EDGE"', -- "EDGE" table > 'src=src, dest=dest', -- "EDGE" args > 'pagerank_ppr_grp_out', -- Output table of PageRank > NULL, -- Default damping factor (0.85) > NULL, -- Default max iters (100) > NULL, -- Default Threshold > 'user_id');{code} > I will get result > {code:java} > madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id | > id | pagerank > ---------+----+------------------- > 1 | 0 | 0.05 > 1 | 0 | 0.05 > 1 | 1 | 0.614906399170753 > 1 | 2 | 0.614906399170753 > 2 | 0 | 0.075 > 2 | 1 | 0.13875 > (6 rows){code} > where user_id=1, id=1, pagerank=0.05 appears twice. > We should correct it to only show distinct result. > > Besides, for user_id=1, all pagerank scores should sum up to 1. The score for > user_id=1, id=1 should be 0.475, and the score for user_id=1, id=2 should be > 0.475. We should correct this calculation too. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)