Hi Davide,

Am 03.07.20 um 11:31 schrieb Davide Cittaro:
> Hello, 
> I'm testing the new Planted Partition model in graph-tool on my data, indeed 
> I'm finding interesting results. I have some questions/observations, though.
> - PPBlockState returns a relatively large number of partitions on large 
> networks, which is fine and expected. When I use NSBM, instead, I make use of 
> the hierarchy not only because I can "abstract" partitions up to a certain 
> level, but also because the hierarchy has a meaning in my case. Is there (or 
> will it be there) a hierarchical formulation of the PPBlockState?

A hierarchical prior for the PP model is certainly feasible, and it is
something that could come up in the future, but I can't promise when.

> - I tried multiple initialisations of PPBlockState over my graph, I also 
> tried to increase the iterations of the initial MCMC sweep and I'd say I get 
> very consistent results. Is this expected? I mean, is it known if the 
> PPBlockState converges to a stable solution faster and in a consistent way?

This depends a lot on the underlying data. If the model is a good fit,
then this consistency is expected, otherwise it's not. It will not
necessarily behave like this for every data.

> - Does the time required to converge scales with the number of edges as it 
> does for SBM?

Like in the SBM, the MCMC sweeps take time proportional to the number of
edges, but the multiplicative factor is smaller, since the model is simpler.

> - As far as I understand, if the assortativity is the dominant pattern the 
> difference between PP and NSBM is negligible. I don't know how to quantify 
> "negligible" as the differences in entropies are at least in the order of 1e2 
> in the cases I tested (seems pretty large to me); I would be happy to switch 
> to PP, also given the shorter runtime so far, but I'm a bit concerned about 
> these differences.

I do not recommend simply switching to PP for every analysis. As was
described in the paper, the SBM is still a more powerful model, that is
capable of better capturing the network structure in a wider variety of
cases.

To answer your question, you can test whether the two models give
similar answers by comparing their partitions. You can use the
partition_overlap() function for that.

Comparing the description length is useful to select the best fitting
model, but not to tell if they give similar answers.

Best,
Tiago


-- 
Tiago de Paula Peixoto <ti...@skewed.de>

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
graph-tool mailing list
graph-tool@skewed.de
https://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to