[ 
https://issues.apache.org/jira/browse/MADLIB-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545433#comment-16545433
 ] 

ASF GitHub Bot commented on MADLIB-1229:
----------------------------------------

GitHub user njayaram2 opened a pull request:

    https://github.com/apache/madlib/pull/294

    Pagerank: Remove duplicate entries from grouping output

    JIRA: MADLIB-1229
    JIRA: MADLIB-1253
    
    Fixes the missing output for complete graphs bug as well.
    
    Co-authored-by: Nandish Jayaram <[email protected]>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib bug/pagerank-dup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #294
    
----
commit 1b55acac9d5550e0a74fa46ec0ab4842d089ac1c
Author: Orhan Kislal <okislal@...>
Date:   2018-07-14T00:09:11Z

    Pagerank: Remove duplicate entries from grouping output
    
    JIRA: MADLIB-1229
    JIRA: MADLIB-1253
    
    Fixes the missing output for complete graphs bug as well.
    
    Co-authored-by: Nandish Jayaram <[email protected]>

----


> Duplicated result in PageRank output table with grouping
> --------------------------------------------------------
>
>                 Key: MADLIB-1229
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1229
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Graph
>            Reporter: Jingyi Mei
>            Assignee: Himanshu Pandey
>            Priority: Minor
>             Fix For: v1.15
>
>
> In madlib 1.13, if I run the follow query
> {code:java}
> DROP TABLE IF EXISTS vertex, "EDGE";
> CREATE TABLE vertex(
> id INTEGER
> );
> CREATE TABLE "EDGE"(
> src INTEGER,
> dest INTEGER,
> user_id INTEGER
> );
> INSERT INTO vertex VALUES
> (0),
> (1),
> (2);
> INSERT INTO "EDGE" VALUES
> (0, 1, 1),
> (0, 2, 1),
> (1, 2, 1),
> (2, 1, 1),
> (0, 1, 2);
> DROP TABLE IF EXISTS pagerank_ppr_grp_out;
> DROP TABLE IF EXISTS pagerank_ppr_grp_out_summary;
> SELECT pagerank(
> 'vertex', -- Vertex table
> 'id', -- Vertix id column
> '"EDGE"', -- "EDGE" table
> 'src=src, dest=dest', -- "EDGE" args
> 'pagerank_ppr_grp_out', -- Output table of PageRank
> NULL, -- Default damping factor (0.85)
> NULL, -- Default max iters (100)
> NULL, -- Default Threshold 
> 'user_id');{code}
> I will get result
> {code:java}
> madlib=# select * from pagerank_ppr_grp_out order by user_id, id; user_id | 
> id | pagerank
> ---------+----+-------------------
> 1 | 0 | 0.05
> 1 | 0 | 0.05
> 1 | 1 | 0.614906399170753
> 1 | 2 | 0.614906399170753
> 2 | 0 | 0.075
> 2 | 1 | 0.13875
> (6 rows){code}
> where user_id=1, id=1, pagerank=0.05 appears twice.
> We should correct it to only show distinct result.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to