As you'd expect, this is one of those things where setting it all up right, with the right representation and algorithm, can make orders of magnitude of difference.
Before you conclude "Mahout is slow", let's get a lot more info to begin to suggest where to improve performance. Looks like you're doing 3.5 user requests per core per second, which doesn't sound normal. For example running the slope-one algorithm on a data set of this size usually returns results in < 10ms on my laptop, and you're seeing 300ms. What algorithm are you using? How many users, items? What DataModel (assuming it's in memory -- GenericDataModel or something)
