Hey all, So as I mentioned on Stephen's IO Testing thread a few days ago I've been doing a bunch of investigating into performance testing frameworks. I've put all my thoughts into a doc here and I'd love to hear thoughts about my investigation and what I'm proposing going forward.
https://docs.google.com/document/d/18ffP1vYurvNe92Efs_ 6hFFBDYC2dQEdWw135_GWZ2YU/view Copying from the earlier mail: The tl;dr version is that there are a number of tools out there, but that the best one I was able to find was a tool called PerfKit Benchmarker (PKB)[1]. As it turns out, they already had the ability to benchmark Spark (I have a PR out to extend the Spark functionality[2] and a couple more improvements in the works), and I've put together some additional work in a branch on my repository[3] to enable proof-of-concept Dataflow Java benchmarks. I'm pretty excited about it overall. [1] https://github.com/GoogleCloudPlatform/PerfKitBenchmarker [2] https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/pull/1214 [3] https://github.com/jasonkuster/PerfKitBenchmarker/tree/beam Looking forward to moving forward with this. Jason -- ------- Jason Kuster Apache Beam (Incubating) / Google Cloud Dataflow