Here's a sketch of a different algorithm that I believe converges to the same value:
1. Initialize thresh to the mean value of the image pixels 2. Compute the mean of the pixels that are larger than thresh, mplus, and smaller than thresh, mminus. 3. Set thresh to the mean of mminus and mplus, and then loop back to 1. Iterate to a fixed point. You can implement this version without allocating any extra data structures, so it may well be much faster. Sorry I don't have any references here. Writing from a phone.