Jingyi Mei created MADLIB-1229:
----------------------------------

             Summary: Duplicated result in PageRank output table with grouping
                 Key: MADLIB-1229
                 URL: https://issues.apache.org/jira/browse/MADLIB-1229
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Graph
            Reporter: Jingyi Mei
             Fix For: v1.15


In madlib 1.13, if I run the follow query
{code:java}
DROP TABLE IF EXISTS vertex, "EDGE";
CREATE TABLE vertex(
id INTEGER
);
CREATE TABLE "EDGE"(
src INTEGER,
dest INTEGER,
user_id INTEGER
);
INSERT INTO vertex VALUES
(0),
(1),
(2);
INSERT INTO "EDGE" VALUES
(0, 1, 1),
(0, 2, 1),
(1, 2, 1),
(2, 1, 1),
(0, 1, 2);


DROP TABLE IF EXISTS pagerank_ppr_grp_out;
DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary;
SELECT pagerank(
'vertex', -- Vertex table
'id', -- Vertix id column
'"EDGE"', -- "EDGE" table
'src=src, dest=dest', -- "EDGE" args
'pagerank_ppr_grp_out', -- Output table of PageRank
NULL, -- Default damping factor (0.85)
NULL, -- Default max iters (100)
NULL, -- Default Threshold 
'user_id');{code}
I will get result
{code:java}
madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id | id 
| pagerank
---------+----+-------------------
1 | 0 | 0.05
1 | 0 | 0.05
1 | 1 | 0.614906399170753
1 | 2 | 0.614906399170753
2 | 0 | 0.075
2 | 1 | 0.13875
(6 rows){code}
where user_id=1, id=1, pagerank=0.05 appears twice.

We should correct it to only show distinct result.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to