Re: [igraph] Performance issue regarding when calculating induced_subgra

Tamas Nepusz Wed, 04 May 2016 06:58:32 -0700

Hi Adam,

> It seems like that I can intensify the tension. :) [...] I
> examined the cluster nr 293812 and it looks like the same as the nr 293815
> but here I can calculate the authority score.


Well, welcome to the world of eigenvector calculations using ARPACK!
:) This is how it usually works. As far as I know, ARPACK basically
implements a fancier and more effective version of the so-called
"power iteration" method when it comes to calculating the eigenvector
corresponding to the dominant eigenvalue. The power iteration method
works like this:

1. You choose an arbitrary starting vector and normalize it.
2. You multiply it with the matrix for which you need the dominant
eigenvector and normalize the product again.
3. Repeat step 2 until convergence (i.e. until the vector does not
change "too much" between iterations) or until you reach the maximum
number of allowed iterations.

(ARPACK does something more sophisticated that is called the
"implicitly restarted Arnoldi method", but the general idea is the
same as far as I know).

Now, if you are lucky, the process converges quickly for a unique
eigenvector for most of the chosen starting vectors (except if the
chosen starting vector happens to be parallel to some
_other_eigenvector, in which case it will converge to that one - but
it is very unlikely to happen). However, if the dominant eigenvalue
has multiplicity > 1, then basically any linear combination of the
dominant eigenvectors is also a dominant eigenvector, so it might be
the case that the vector obtained from consecutive iterations will not
converge to a unique vector but keep on "rotating around" in the
subspace spanned by the dominant eigenvectors. So, this is not an
error in igraph, it just happens to be the case that there is no
_unique_ solution of the authority score formula for star-shaped
graphs.

For instance, let us consider a star graph consisting of four vertices
A, B, C and D such that A is the central vertex. For sake of
simplicity, let us also assume that all the weights are equal. The
authority and hub scores are defined in a recursive way, and the
definitions roughly boil down to this:

"The authority score of vertex is i equal to the sum of the hub scores
of all its neighbors, divided by a constant C1 chosen in a way that
the sum of all the authority scores is 1".

Similarly,

"The hub score of vertex is i equal to the sum of the authority scores
of all its neighbors, divided by a constant C2 chosen in a way that
the sum of all the hub scores is 1".

Note that you cannot really calculate the authority scores alone - you
need to calculate both the authority and the hub scores because they
mutually refer to each other.

Now, consider the 4-vertex undirected star graph and the following solutions:

authority scores = [1, 0, 0, 0]
hub scores = [0, 1/3, 1/3, 1/3]

Is it a good solution? Yes, with C1 = 1 and C2 = 1/3. Now, consider
this solution:

authority scores = [0, 1/3, 1/3, 1/3]
hub scores = [1, 0, 0, 0]

Is it a good solution? Yes, it is _also_ a good solution, with C1 =
1/3 and C2 = 1. Which one yields the "real" authority score vector?
The answer is that _both_ of these are valid authority scores, and any
linear combination of these two vectors are also valid authority
scores. The problem here is that you cannot simply answer questions
like "who are the most authoritative people in this star-shaped
network", because one of the solutions tell you that the central
vertex is a good hub and the other three vertices are good
authorities, and the other solution tells you otherwise.

> Another thing: can be installing arpack-ng a solution for the arpack fails?
> https://github.com/pv/arpack-ng
No, because the authority score itself is ill-defined for your input.
Replacing the eigensolver with a different eigensolver would not
magically reformulate the original definition. This is an inherent
limitation of the authority score measure; a more detailed analysis of
this is to be found in the following paper:

https://www.math.hmc.edu/~ward/paperpdfs/hitsheaderbw6Jan05.pdf

(Naive implementations of the authority and hub score calculations
often hide this problem because the most naive implementations are
based on the power iteration and simply tend to stop the calculation
after a given number of iterations even if convergence was not
reached, giving you the false impression that there is a unique
solution where in fact there isn't).

I would probably try what I mentioned in my previous email: calculate
the communities on the undirected network but calculate the centrality
measures (eigenvector centrality, hub and authority scores) in the
directed variant, which is less likely to suffer from these problems
(but the chance is still non-negligible, see Fig 4.1 in the paper I
linked above for an example).

Best,
T.

_______________________________________________
igraph-help mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/igraph-help

Re: [igraph] Performance issue regarding when calculating induced_subgra

Reply via email to