[
https://issues.apache.org/jira/browse/MADLIB-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545433#comment-16545433
]
ASF GitHub Bot commented on MADLIB-1229:
----------------------------------------
GitHub user njayaram2 opened a pull request:
https://github.com/apache/madlib/pull/294
Pagerank: Remove duplicate entries from grouping output
JIRA: MADLIB-1229
JIRA: MADLIB-1253
Fixes the missing output for complete graphs bug as well.
Co-authored-by: Nandish Jayaram <[email protected]>
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib bug/pagerank-dup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/294.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #294
----
commit 1b55acac9d5550e0a74fa46ec0ab4842d089ac1c
Author: Orhan Kislal <okislal@...>
Date: 2018-07-14T00:09:11Z
Pagerank: Remove duplicate entries from grouping output
JIRA: MADLIB-1229
JIRA: MADLIB-1253
Fixes the missing output for complete graphs bug as well.
Co-authored-by: Nandish Jayaram <[email protected]>
----
> Duplicated result in PageRank output table with grouping
> --------------------------------------------------------
>
> Key: MADLIB-1229
> URL: https://issues.apache.org/jira/browse/MADLIB-1229
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Graph
> Reporter: Jingyi Mei
> Assignee: Himanshu Pandey
> Priority: Minor
> Fix For: v1.15
>
>
> In madlib 1.13, if I run the follow query
> {code:java}
> DROP TABLE IF EXISTS vertex, "EDGE";
> CREATE TABLE vertex(
> id INTEGER
> );
> CREATE TABLE "EDGE"(
> src INTEGER,
> dest INTEGER,
> user_id INTEGER
> );
> INSERT INTO vertex VALUES
> (0),
> (1),
> (2);
> INSERT INTO "EDGE" VALUES
> (0, 1, 1),
> (0, 2, 1),
> (1, 2, 1),
> (2, 1, 1),
> (0, 1, 2);
> DROP TABLE IF EXISTS pagerank_ppr_grp_out;
> DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary;
> SELECT pagerank(
> 'vertex', -- Vertex table
> 'id', -- Vertix id column
> '"EDGE"', -- "EDGE" table
> 'src=src, dest=dest', -- "EDGE" args
> 'pagerank_ppr_grp_out', -- Output table of PageRank
> NULL, -- Default damping factor (0.85)
> NULL, -- Default max iters (100)
> NULL, -- Default Threshold
> 'user_id');{code}
> I will get result
> {code:java}
> madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id |
> id | pagerank
> ---------+----+-------------------
> 1 | 0 | 0.05
> 1 | 0 | 0.05
> 1 | 1 | 0.614906399170753
> 1 | 2 | 0.614906399170753
> 2 | 0 | 0.075
> 2 | 1 | 0.13875
> (6 rows){code}
> where user_id=1, id=1, pagerank=0.05 appears twice.
> We should correct it to only show distinct result.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)