Hi Spark experts
I'm using ehiggs/spark-terasort to exercise my cluster.
I don't understand how to run the terasort in a standard way when using
cluster.
Currently, all the input data and output data is put into hdfs, and I
can generate/sort/validate
all the sample data.But I'm not sure
to simulate Terasort on spark would be of great
help..
Kindly help with the same..
Regards
Harsha
Hi all , i tried to run a terasort benchmark on my spark cluster, but i
found it is hard to find a standard spark terasort program except a PR from
rxin and ewan higgs:
https://github.com/apache/spark/pull/1242
https://github.com/ehiggs/spark/tree/terasort
The example which rxin provided without