Hi Kannan,
I have a branch here:
https://github.com/ehiggs/spark/tree/terasort
The code is in the examples. I don't do any fancy partitioning so it
could be made quicker, I'm sure. But it should be a good baseline.
I have a WIP PR for spark-perf but I'm having trouble building it
there[1]. I put it on the back burner until someone can get back to me
on it.
Yours,
Ewan Higgs
[1]
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSpark-perf-terasort-WIP-branch-tt10105.html
On 02/02/15 23:26, Kannan Rajah wrote:
Is there a recommended performance test for sort based shuffle? Something
similar to terasort on Hadoop. I couldn't find one on the spark-perf code
base.
https://github.com/databricks/spark-perf
--
Kannan
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org