[ 
https://issues.apache.org/jira/browse/MADLIB-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165416#comment-16165416
 ] 

Frank McQuillan commented on MADLIB-1124:
-----------------------------------------

I had a look at the PR and checked the following:

1) user doc examples work OK as shown

2)  from 
http://www.cis.hut.fi/Opinnot/T-61.6020/2008/pagerank_hits.pdf
I tried the toy example on slide 8

{code}
DROP TABLE IF EXISTS vertex, edge;
CREATE TABLE vertex(
        id INTEGER
        );
CREATE TABLE edge(
        src INTEGER,
        dest INTEGER,
        user_id INTEGER
        );
INSERT INTO vertex VALUES
(0),
(1),
(2),
(3);
INSERT INTO edge VALUES
(0, 1, 1),
(0, 2, 1),
(0, 3, 1),
(1, 2, 1),
(1, 3, 1),
(2, 1, 1);
SELECT * from edge ORDER BY src, dest;
{code}
produces
{code}
 src | dest | user_id 
-----+------+---------
   0 |    1 |       1
   0 |    2 |       1
   0 |    3 |       1
   1 |    2 |       1
   1 |    3 |       1
   2 |    1 |       1
(6 rows)
{code}

Run HITS
{code}
DROP TABLE IF EXISTS hits_out, hits_out_summary;

SELECT madlib.hits(
             'vertex',             -- Vertex table
             'id',                 -- Vertex id column
             'edge',               -- Edge table
             'src=src, dest=dest', -- Comma delimited string of edge arguments
             'hits_out',           -- Output table of HITS
             100);                   -- Max iteration
SELECT * FROM hits_out ORDER BY id;
{code}
produces
{code}
 id |     authority     |        hub        
----+-------------------+-------------------
  0 |                 0 | 0.788680749581252
  1 | 0.459746429928187 | 0.577334927798041
  2 | 0.627946343316548 | 0.211345821783211
  3 | 0.627946343316548 |                 0
(4 rows)
{code}
which matches the reference


——————

Here are my comments on the user docs:


1) Please reference the original paper by Kleinburg in addition to Wikipedia.

2) Pls fix the note format under grouping_cols (missing yellow bar).  See 
PageRank to see what I mean.

3) Formatting issue below example 2, occurs 3 times with
__iterations__

4)  out_table
TEXT. Name of the table to store the result of HITS. It will contain a row for 
every vertex from 'vertex_table' with the following columns:

vertex_id : The id of a vertex. Will use the input parameter 'vertex_id' for 
column naming.
auth : The vertex's Authority score.
hub : The vertex's Hub score.

but it seems column is called “authority” not “auth” so just change the docs to 
match:

{code}
id      authority       hub
0       8.43871829095e-07       0.338306115082
1       0.158459587238  0.527865350448
2       0.40562796969   0.675800764727
3       0.721775835523  3.95111934817e-07
4       0.158459587238  3.95111934817e-07
5       0.316385413094  0.189719957843
6       0.405199928762  0.337944978189
{code}

5) Indicate that params are optional:

max_iter (optional)

threshold (optional)





> Graph - HITS algorithm
> ----------------------
>
>                 Key: MADLIB-1124
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1124
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Graph
>            Reporter: Frank McQuillan
>            Assignee: Jingyi Mei
>             Fix For: v2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to