Spark benchmarks

2019-03-27 Thread Michael Mior
I'm looking for recommendations on benchmarks for Spark. I'm familiar
with spark-bench[0], but I haven't found much else that suits my
needs. The main property I'm looking for is that the workload of the
benchmark should benefit significantly from non-trivial use of Spark's
caching mechanism since I'm mainly interested in evaluating cache
performance under different scenarios.

(By "non-trivial", I mean more than simply caching a single input RDD
which is reusued a few times.)

Any suggestions appreciated!

[0] https://github.com/CODAIT/spark-bench
--
Michael Mior
mm...@apache.org

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Task partition ID in Spark event logs

2017-07-20 Thread Michael Mior
I see there's a comment in the TaskInfo class that the index may not be the
same as the ID of the RDD partition the task is computing. Under what
circumstances *will* the ID by the same? If there are zero guarantees, any
suggestions on how to grab this info from the scheduler to populate a new
field inside TaskInfo?

Cheers,
--
Michael Mior
mm...@apache.org