Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Avery Ching
We did have a related issue 
(https://issues.apache.org/jira/browse/GIRAPH-155).


On 5/29/12 6:54 AM, Claudio Martella wrote:

I'm not sure they will be needed to send them on the first superstep.
They'll be created and used in the second superstep if necessary. If
they need it in the first superstep, then i guess they'll put them as
a line in the inputfile.
I agree with you that this is kind of messed up :)


On Tue, May 29, 2012 at 3:23 PM, Sebastian Schelter  wrote:

Oh sorry, I didn't know that discussion. The problem I see is that in
every implementation, a user might run into this issue, and I don't
think its ideal to force users to always run a round of sending empty
messages at the beginning.

Maybe the system should (somehow) automagically do that for the users?
Really seems to be an awkward situation though...

--sebastian



On 29.05.2012 15:03, Claudio Martella wrote:

About the mapreduce job to prepare the inputset, I did advocate for
this solution instead of supporting automatic creation of non-existent
vertices implicitly (which I believe adds a logical path in vertex
resolution which has some drawbacks e.g you have to check in the
hashmap for the existence of the destination vertex for each message,
which is "fine" now that it's a hashmap, but it's going to be less
fine when/if we turn to TreeMap for out-of-core).

Unfortunately the other committers preferred going for the path that
helps userland's life, so I guess this solution is not to be
considered here either.

On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter  wrote:

On 29.05.2012 13:13, Paolo Castagna wrote:

Hi Sebastian

Sebastian Schelter wrote:

Why do you only recompute the pageRank in each second superstep? Can we
not use the aggregated value of the dangling nodes from the last superstep?

I removed the computing of PageRank values every each second superstep.
However, I needed to use a couple of aggregators for the dangling nodes
contribution instead of just one: "dangling-current" and "dangling-previous".

Each superstep, I need to reset the dangling-current aggregator, at the
same time, I need to know the value of the aggregator at a previous
superstep.

You can save the value from the previous step in a static variable in
the WorkerContext before resetting the aggregator.


I hope it makes sense, let me know if you have a better idea.


Overall I think we're on a good way to a robust, real-world PageRank
implementation, I managed to implement the convergence check with an
aggregator, will post an updated patch soon.

I think I've just done it, have a look [1] and let me know if you would have
done it differently.

Paolo

  [1]
https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90












Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Claudio Martella
I'm not sure they will be needed to send them on the first superstep.
They'll be created and used in the second superstep if necessary. If
they need it in the first superstep, then i guess they'll put them as
a line in the inputfile.
I agree with you that this is kind of messed up :)


On Tue, May 29, 2012 at 3:23 PM, Sebastian Schelter  wrote:
> Oh sorry, I didn't know that discussion. The problem I see is that in
> every implementation, a user might run into this issue, and I don't
> think its ideal to force users to always run a round of sending empty
> messages at the beginning.
>
> Maybe the system should (somehow) automagically do that for the users?
> Really seems to be an awkward situation though...
>
> --sebastian
>
>
>
> On 29.05.2012 15:03, Claudio Martella wrote:
>> About the mapreduce job to prepare the inputset, I did advocate for
>> this solution instead of supporting automatic creation of non-existent
>> vertices implicitly (which I believe adds a logical path in vertex
>> resolution which has some drawbacks e.g you have to check in the
>> hashmap for the existence of the destination vertex for each message,
>> which is "fine" now that it's a hashmap, but it's going to be less
>> fine when/if we turn to TreeMap for out-of-core).
>>
>> Unfortunately the other committers preferred going for the path that
>> helps userland's life, so I guess this solution is not to be
>> considered here either.
>>
>> On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter  wrote:
>>> On 29.05.2012 13:13, Paolo Castagna wrote:
 Hi Sebastian

 Sebastian Schelter wrote:
> Why do you only recompute the pageRank in each second superstep? Can we
> not use the aggregated value of the dangling nodes from the last 
> superstep?

 I removed the computing of PageRank values every each second superstep.
 However, I needed to use a couple of aggregators for the dangling nodes
 contribution instead of just one: "dangling-current" and 
 "dangling-previous".

 Each superstep, I need to reset the dangling-current aggregator, at the
 same time, I need to know the value of the aggregator at a previous
 superstep.
>>>
>>> You can save the value from the previous step in a static variable in
>>> the WorkerContext before resetting the aggregator.
>>>

 I hope it makes sense, let me know if you have a better idea.

> Overall I think we're on a good way to a robust, real-world PageRank
> implementation, I managed to implement the convergence check with an
> aggregator, will post an updated patch soon.

 I think I've just done it, have a look [1] and let me know if you would 
 have
 done it differently.

 Paolo

  [1]
 https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90


>>>
>>
>>
>>
>



-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Sebastian Schelter
Oh sorry, I didn't know that discussion. The problem I see is that in
every implementation, a user might run into this issue, and I don't
think its ideal to force users to always run a round of sending empty
messages at the beginning.

Maybe the system should (somehow) automagically do that for the users?
Really seems to be an awkward situation though...

--sebastian



On 29.05.2012 15:03, Claudio Martella wrote:
> About the mapreduce job to prepare the inputset, I did advocate for
> this solution instead of supporting automatic creation of non-existent
> vertices implicitly (which I believe adds a logical path in vertex
> resolution which has some drawbacks e.g you have to check in the
> hashmap for the existence of the destination vertex for each message,
> which is "fine" now that it's a hashmap, but it's going to be less
> fine when/if we turn to TreeMap for out-of-core).
> 
> Unfortunately the other committers preferred going for the path that
> helps userland's life, so I guess this solution is not to be
> considered here either.
> 
> On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter  wrote:
>> On 29.05.2012 13:13, Paolo Castagna wrote:
>>> Hi Sebastian
>>>
>>> Sebastian Schelter wrote:
 Why do you only recompute the pageRank in each second superstep? Can we
 not use the aggregated value of the dangling nodes from the last superstep?
>>>
>>> I removed the computing of PageRank values every each second superstep.
>>> However, I needed to use a couple of aggregators for the dangling nodes
>>> contribution instead of just one: "dangling-current" and 
>>> "dangling-previous".
>>>
>>> Each superstep, I need to reset the dangling-current aggregator, at the
>>> same time, I need to know the value of the aggregator at a previous
>>> superstep.
>>
>> You can save the value from the previous step in a static variable in
>> the WorkerContext before resetting the aggregator.
>>
>>>
>>> I hope it makes sense, let me know if you have a better idea.
>>>
 Overall I think we're on a good way to a robust, real-world PageRank
 implementation, I managed to implement the convergence check with an
 aggregator, will post an updated patch soon.
>>>
>>> I think I've just done it, have a look [1] and let me know if you would have
>>> done it differently.
>>>
>>> Paolo
>>>
>>>  [1]
>>> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
>>>
>>>
>>
> 
> 
> 



Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Claudio Martella
About the mapreduce job to prepare the inputset, I did advocate for
this solution instead of supporting automatic creation of non-existent
vertices implicitly (which I believe adds a logical path in vertex
resolution which has some drawbacks e.g you have to check in the
hashmap for the existence of the destination vertex for each message,
which is "fine" now that it's a hashmap, but it's going to be less
fine when/if we turn to TreeMap for out-of-core).

Unfortunately the other committers preferred going for the path that
helps userland's life, so I guess this solution is not to be
considered here either.

On Tue, May 29, 2012 at 1:48 PM, Sebastian Schelter  wrote:
> On 29.05.2012 13:13, Paolo Castagna wrote:
>> Hi Sebastian
>>
>> Sebastian Schelter wrote:
>>> Why do you only recompute the pageRank in each second superstep? Can we
>>> not use the aggregated value of the dangling nodes from the last superstep?
>>
>> I removed the computing of PageRank values every each second superstep.
>> However, I needed to use a couple of aggregators for the dangling nodes
>> contribution instead of just one: "dangling-current" and "dangling-previous".
>>
>> Each superstep, I need to reset the dangling-current aggregator, at the
>> same time, I need to know the value of the aggregator at a previous
>> superstep.
>
> You can save the value from the previous step in a static variable in
> the WorkerContext before resetting the aggregator.
>
>>
>> I hope it makes sense, let me know if you have a better idea.
>>
>>> Overall I think we're on a good way to a robust, real-world PageRank
>>> implementation, I managed to implement the convergence check with an
>>> aggregator, will post an updated patch soon.
>>
>> I think I've just done it, have a look [1] and let me know if you would have
>> done it differently.
>>
>> Paolo
>>
>>  [1]
>> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
>>
>>
>



-- 
   Claudio Martella
   claudio.marte...@gmail.com


Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Sebastian Schelter
On 29.05.2012 13:13, Paolo Castagna wrote:
> Hi Sebastian
> 
> Sebastian Schelter wrote:
>> Why do you only recompute the pageRank in each second superstep? Can we
>> not use the aggregated value of the dangling nodes from the last superstep?
> 
> I removed the computing of PageRank values every each second superstep.
> However, I needed to use a couple of aggregators for the dangling nodes
> contribution instead of just one: "dangling-current" and "dangling-previous".
> 
> Each superstep, I need to reset the dangling-current aggregator, at the
> same time, I need to know the value of the aggregator at a previous
> superstep.

You can save the value from the previous step in a static variable in
the WorkerContext before resetting the aggregator.

> 
> I hope it makes sense, let me know if you have a better idea.
> 
>> Overall I think we're on a good way to a robust, real-world PageRank
>> implementation, I managed to implement the convergence check with an
>> aggregator, will post an updated patch soon.
> 
> I think I've just done it, have a look [1] and let me know if you would have
> done it differently.
> 
> Paolo
> 
>  [1]
> https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90
> 
> 



Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-29 Thread Paolo Castagna
Hi Sebastian

Sebastian Schelter wrote:
> Why do you only recompute the pageRank in each second superstep? Can we
> not use the aggregated value of the dangling nodes from the last superstep?

I removed the computing of PageRank values every each second superstep.
However, I needed to use a couple of aggregators for the dangling nodes
contribution instead of just one: "dangling-current" and "dangling-previous".

Each superstep, I need to reset the dangling-current aggregator, at the
same time, I need to know the value of the aggregator at a previous
superstep.

I hope it makes sense, let me know if you have a better idea.

> Overall I think we're on a good way to a robust, real-world PageRank
> implementation, I managed to implement the convergence check with an
> aggregator, will post an updated patch soon.

I think I've just done it, have a look [1] and let me know if you would have
done it differently.

Paolo

 [1]
https://github.com/castagna/jena-grande/blob/11f07dd897562f7a4bf8d6e4845128d7f2cdd2ff/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java#L90




Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Paolo Castagna
Sebastian Schelter wrote:
> However, the problem with this input is that the dangling vertices that
> don't have a line of their own (such as 11) cannot contribute their
> accumulated rank, as no vertex for them will be instantiated. So
> counting them doesn't help either.

No, the 'implicit' dangling nodes (such as 6, 7, 9 and 11 below) are
instantiated when you send a message to them. If you run the example,
you'll see that after the first superstep there are 11 vertices which
are sending and receiving messages (as it should be with correct input).

> I think that we should rely on users supplying valid input (a line for
> each vertex) and not try to correct for that in the vertex class.

Well, I don't disagree in principle.

But in practice this won't stop users making mistakes and provide your
software with bad data as input. :-) One superstep for cleaning/validating
the input data isn't that bad after all.

> Creating a line for each vertex from such a file is an easy task that is
> doable with a single MapReduce pass over the data beforehand.

Sure. (Why is this better than a superstep with Giraph?) ;-)

Paolo




Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Sebastian Schelter
Hi Paolo,

I see what you mean.

However, the problem with this input is that the dangling vertices that
don't have a line of their own (such as 11) cannot contribute their
accumulated rank, as no vertex for them will be instantiated. So
counting them doesn't help either.

I think that we should rely on users supplying valid input (a line for
each vertex) and not try to correct for that in the vertex class.

Creating a line for each vertex from such a file is an easy task that is
doable with a single MapReduce pass over the data beforehand.

--sebastian


On 28.05.2012 22:13, Paolo Castagna wrote:
> Hi Sebastian,
> you can try yourself with some simple input (which contains dangling nodes).
> 
> For example, say I have this adjacency list (with mistakes/repetitions and
> self-links, ignore those):
> 
> 1 1 2 2
> 2 3 5 7 9 11
> 3 3 3 6
> 4
> 5 1 2 3 11
> 8 10
> 10 5
> 
> How many vertices? 7 or 11?
> 
> I think this graph has 11 and it's 11 you need to use as number of vertices
> when you compute PageRank.
> 
> Paolo
> 
> Sebastian Schelter wrote:
>> Hi Paolo,
>>
>> Why would getNumVertices() not give back the correct number of vertices?
>> This call should always give back the overall number of vertices (if it
>> doesn't we have to fix it) and you shouldn't have to rely on tricks to
>> count stuff via aggregators.
>>
>> --sebastian
> 
> 



Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Paolo Castagna
Hi Sebastian,
you can try yourself with some simple input (which contains dangling nodes).

For example, say I have this adjacency list (with mistakes/repetitions and
self-links, ignore those):

1 1 2 2
2 3 5 7 9 11
3 3 3 6
4
5 1 2 3 11
8 10
10 5

How many vertices? 7 or 11?

I think this graph has 11 and it's 11 you need to use as number of vertices
when you compute PageRank.

Paolo

Sebastian Schelter wrote:
> Hi Paolo,
> 
> Why would getNumVertices() not give back the correct number of vertices?
> This call should always give back the overall number of vertices (if it
> doesn't we have to fix it) and you shouldn't have to rely on tricks to
> count stuff via aggregators.
> 
> --sebastian




Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Sebastian Schelter
Hi Paolo,

Why would getNumVertices() not give back the correct number of vertices?
This call should always give back the overall number of vertices (if it
doesn't we have to fix it) and you shouldn't have to rely on tricks to
count stuff via aggregators.

--sebastian



On 28.05.2012 21:20, Paolo Castagna wrote:
> Hi Sebastian
> 
> Sebastian Schelter wrote:
>> I think the code can be improved partially. You don't have to count the
>> vertices via an aggregator, you can simply use getNumVertices().
> 
> No, you cannot use getNumVertices() since it won't count dangling nodes.
> If you want to count for dangling nodes you need to send a message to all
> nodes.
> 
>> Why do you only recompute the pageRank in each second superstep? Can we
>> not use the aggregated value of the dangling nodes from the last superstep?
> 
> Maybe, but I did not managed to do it.
> 
> The dangling node contribution needs to be recomputed each time, so the
> aggregator needs to be reset to 0 each time and kept unchanged during each
> superstep. The value is the sum of the PageRank values of all dangling nodes
> in the previous superstep.
> 
> The good thing is that now there is a 'correct' implementation and a test
> we can use to verify that PageRank values are correct (against a third
> party implementation: JUNG).
> 
>> Overall I think we're on a good way to a robust, real-world PageRank
>> implementation, I managed to implement the convergence check with an
>> aggregator, will post an updated patch soon.
> 
> Thanks, that was my next step. :-)
> + dumping factor configurable and general clean-up.
> 
> Paolo
> 
>>
>> Best,
>> Sebastian
>>
>>
>> On 28.05.2012 18:39, Paolo Castagna wrote:
>>> Paolo Castagna wrote:
 Sebastian Schelter wrote:
> I guess that summing up and redistributing the pagerank of dangling
> vertices can also be done without an extra superstep in an aggregator.
 Yeah! Why didn't I think about that?
 Thanks, great suggestion.

 I am going to give this a go, at first without extending RandomWalkVertex 
 since I want to see how it might work.
 This would also inform design and improvements of the current 
 RandomWalkVertex.
>>> You can find a 'proper' ;-) implementation of PageRank with dangling nodes
>>> support, sum at the end of all PageRank values equals to 1.00 (since it's a
>>> probability distribution) and PageRank values validated against a third 
>>> party
>>> implementation (i.e. JUNG), here [1].
>>>
>>> I have not managed to do it as Sebastian suggested yet.
>>>
>>> Paolo
>>>
>>>  [1]
>>> https://github.com/castagna/jena-grande/blob/3c2d9f85bacb737acd575e3e287dc0fcc6bd96b9/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java
>>>
>>



Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Paolo Castagna
Hi Sebastian

Sebastian Schelter wrote:
> Could we try to merge this with the patch from
> https://issues.apache.org/jira/browse/GIRAPH-191 ?

I'll look at doing that, but so far I did not managed to have
a correct implementation. Now, that I have an implementation
which I know to be correct, I'll try using RandomWalkVertex.

Paolo


Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Paolo Castagna
Hi Sebastian

Sebastian Schelter wrote:
> I think the code can be improved partially. You don't have to count the
> vertices via an aggregator, you can simply use getNumVertices().

No, you cannot use getNumVertices() since it won't count dangling nodes.
If you want to count for dangling nodes you need to send a message to all
nodes.

> Why do you only recompute the pageRank in each second superstep? Can we
> not use the aggregated value of the dangling nodes from the last superstep?

Maybe, but I did not managed to do it.

The dangling node contribution needs to be recomputed each time, so the
aggregator needs to be reset to 0 each time and kept unchanged during each
superstep. The value is the sum of the PageRank values of all dangling nodes
in the previous superstep.

The good thing is that now there is a 'correct' implementation and a test
we can use to verify that PageRank values are correct (against a third
party implementation: JUNG).

> Overall I think we're on a good way to a robust, real-world PageRank
> implementation, I managed to implement the convergence check with an
> aggregator, will post an updated patch soon.

Thanks, that was my next step. :-)
+ dumping factor configurable and general clean-up.

Paolo

> 
> Best,
> Sebastian
> 
> 
> On 28.05.2012 18:39, Paolo Castagna wrote:
>> Paolo Castagna wrote:
>>> Sebastian Schelter wrote:
 I guess that summing up and redistributing the pagerank of dangling
 vertices can also be done without an extra superstep in an aggregator.
>>> Yeah! Why didn't I think about that?
>>> Thanks, great suggestion.
>>>
>>> I am going to give this a go, at first without extending RandomWalkVertex 
>>> since I want to see how it might work.
>>> This would also inform design and improvements of the current 
>>> RandomWalkVertex.
>> You can find a 'proper' ;-) implementation of PageRank with dangling nodes
>> support, sum at the end of all PageRank values equals to 1.00 (since it's a
>> probability distribution) and PageRank values validated against a third party
>> implementation (i.e. JUNG), here [1].
>>
>> I have not managed to do it as Sebastian suggested yet.
>>
>> Paolo
>>
>>  [1]
>> https://github.com/castagna/jena-grande/blob/3c2d9f85bacb737acd575e3e287dc0fcc6bd96b9/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java
>>
> 


Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Sebastian Schelter
I think the code can be improved partially. You don't have to count the
vertices via an aggregator, you can simply use getNumVertices().

Why do you only recompute the pageRank in each second superstep? Can we
not use the aggregated value of the dangling nodes from the last superstep?

Overall I think we're on a good way to a robust, real-world PageRank
implementation, I managed to implement the convergence check with an
aggregator, will post an updated patch soon.

Best,
Sebastian


On 28.05.2012 18:39, Paolo Castagna wrote:
> Paolo Castagna wrote:
>> Sebastian Schelter wrote:
>>> I guess that summing up and redistributing the pagerank of dangling
>>> vertices can also be done without an extra superstep in an aggregator.
>>
>> Yeah! Why didn't I think about that?
>> Thanks, great suggestion.
>>
>> I am going to give this a go, at first without extending RandomWalkVertex 
>> since I want to see how it might work.
>> This would also inform design and improvements of the current 
>> RandomWalkVertex.
> 
> You can find a 'proper' ;-) implementation of PageRank with dangling nodes
> support, sum at the end of all PageRank values equals to 1.00 (since it's a
> probability distribution) and PageRank values validated against a third party
> implementation (i.e. JUNG), here [1].
> 
> I have not managed to do it as Sebastian suggested yet.
> 
> Paolo
> 
>  [1]
> https://github.com/castagna/jena-grande/blob/3c2d9f85bacb737acd575e3e287dc0fcc6bd96b9/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java
> 



Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Sebastian Schelter
Hi Paolo,

Could we try to merge this with the patch from
https://issues.apache.org/jira/browse/GIRAPH-191 ?

Best,
Sebastian

On 28.05.2012 18:39, Paolo Castagna wrote:
> Paolo Castagna wrote:
>> Sebastian Schelter wrote:
>>> I guess that summing up and redistributing the pagerank of dangling
>>> vertices can also be done without an extra superstep in an aggregator.
>>
>> Yeah! Why didn't I think about that?
>> Thanks, great suggestion.
>>
>> I am going to give this a go, at first without extending RandomWalkVertex 
>> since I want to see how it might work.
>> This would also inform design and improvements of the current 
>> RandomWalkVertex.
> 
> You can find a 'proper' ;-) implementation of PageRank with dangling nodes
> support, sum at the end of all PageRank values equals to 1.00 (since it's a
> probability distribution) and PageRank values validated against a third party
> implementation (i.e. JUNG), here [1].
> 
> I have not managed to do it as Sebastian suggested yet.
> 
> Paolo
> 
>  [1]
> https://github.com/castagna/jena-grande/blob/3c2d9f85bacb737acd575e3e287dc0fcc6bd96b9/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java
> 



Re: SimplePageRankVertex implementation, dangling nodes and sending messages to all nodes...

2012-05-28 Thread Paolo Castagna
Paolo Castagna wrote:
> Sebastian Schelter wrote:
>> I guess that summing up and redistributing the pagerank of dangling
>> vertices can also be done without an extra superstep in an aggregator.
> 
> Yeah! Why didn't I think about that?
> Thanks, great suggestion.
> 
> I am going to give this a go, at first without extending RandomWalkVertex 
> since I want to see how it might work.
> This would also inform design and improvements of the current 
> RandomWalkVertex.

You can find a 'proper' ;-) implementation of PageRank with dangling nodes
support, sum at the end of all PageRank values equals to 1.00 (since it's a
probability distribution) and PageRank values validated against a third party
implementation (i.e. JUNG), here [1].

I have not managed to do it as Sebastian suggested yet.

Paolo

 [1]
https://github.com/castagna/jena-grande/blob/3c2d9f85bacb737acd575e3e287dc0fcc6bd96b9/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankVertex.java