This is all very well said! I would recommend using the percentile base approach that Roger implemented. The Pysal folks are in the process of adopting it (with a slight adjustment). I think it is the most “accurate” p-value you will get from the functions today.
I don’t have a recommendation for the upper bound. But you do bring up a good point about the classification of them. I don’t think I’m as qualified to answer that ! On Wed, Jul 3, 2024 at 06:25 Cunningham, Angela via R-sig-Geo < [email protected]> wrote: > Hello all, > > I am using spdep (via sfdep) for a cluster analysis of the rate of rare > events. I am hoping you can provide some advice on how to apply these > functions most appropriately. Specifically I am interested in any guidance > about which significance calculation might be best in these circumstances, > and which (if any) adjustment for multiple testing and spatial dependence > (Bonferroni, FDR, etc) should be paired with the different p value > calculations. > > When running localG_perm(), three Pr values are returned: Pr(z != E(Gi)), > Pr(z != E(Gi)) Sim, and Pr(folded) Sim. My understanding is that the first > value is based on the mean and should only be used for normally distributed > data, that the second uses a rank-percentile approach and is more robust, > and that the last uses a Pysal-based calculation and may be quite > sensitive. Is this correct? The second, Pr(z != E(Gi)) Sim, appears to be > the most appropriate for my data situation; would you suggest otherwise? > > The documentation for localG_perm states that "for inference, a > Bonferroni-type test is suggested"; thus any adjustments for e.g. multiple > testing must be made in a second step, such as with the p.adjust arguments > in the hotspot() function, correct? Further, while fdr is the default for > hotspot(), are there situations like having small numbers, a large number > of simulations, or employing a particular Prname which would recommend a > different p.adjust method? > > Also, if I can bother you all with a very basic question: given that > significance is determined through conditional permutation simulation, > increasing the number of simulations should refine the results and make > them more reliable, but unless a seed is set, I assume that is still always > possible that results will change slightly across separate runs of a model, > perhaps shifting an observation to either side of a threshold. Aside from > computation time, are there other reasons to avoid increasing the number of > simulations beyond a certain point? (It feels a bit like "p-hacking" to > increase the nsim ad infinitum.) Are slight discrepancies in hot spot > assignment between runs even with a large number of permutations to be > expected? Is this particularly the case when working with small numbers? > > Thank you for your time and consideration. > > > Angela R Cunningham, PhD > Spatial Demographer (R&D Associate) > Human Geography Group | Human Dynamics Section > > Oak Ridge National Laboratory > Computational Sciences Building (5600), O401-29 > 1 Bethel Valley Road, Oak Ridge, TN 37830 > <https://www.google.com/maps/search/1+Bethel+Valley+Road,+Oak+Ridge,+TN+37830?entry=gmail&source=g> > [email protected] > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-Geo mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/r-sig-geo > [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-geo
