On Tue, Jun 18, 2013 at 11:01 PM, Sebastian Schelter <s...@apache.org> wrote:
> We could also move the sampling directly to RowSimilarityJob if people > consider this more useful. > It will have a large effect on the time for the RowSimilarityJob for some data. Does anybody have an idea about how much of the total time is in RowSimilarityJob?