One thing I found very irritating when using cosine or numbers in the range 0,1 is that sometimes two distinct items have very small values of distance when you inspect them. I am always worried that precision of float is not enough to capture that small detail that makes the difference of accept or reject. On the other hand Log likelihood similarity seem to have values in the range 100+, sometimes even 1000+ for strong likelihoods. Very unlikely events have small values <1.0
In practice, it kind of holds, as the number of documents increase, I usually have to scale cosine to a larger range or switch to some hybrid similarity metric for good clustering. What about you guys, I mean both of you have worked on huge data sets, what kind of insights can you share about what works and what doesnt.