@Jesse how about runners "tracing" the constructed DAG (by Beam) so that it's clear what the runner actually executed ?
Example: For the SparkRunner, a ParDo translates to a mapPartitions transformation. That could provide transparency when debugging/benchmarking pipelines per-runner. On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com> wrote: > @Dan before starting with Beam, I'd want to know how much performance I've > giving up by not programming directly to the API. > > On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin <dhalp...@google.com.invalid > > > wrote: > > > I think there are lots of excellent one-off performance studies, but I'm > > not sure how useful that is to Beam. > > > > From a test infra point of view, I'm wondering more about tracking of > > performance over time, identifying regressions, etc. > > > > Google has some tools like PerfKit > > <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is > > basically a skin on a database + some scripts to load and query data; > but I > > don't love it. Do other Apache projects do public, long-term benchmarking > > and performance regression testing? > > > > Dan > > > > On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com> > > wrote: > > > > > I found data Artisan's benchmarking post > > > <http://data-artisans.com/high-throughput-low-latency-and- > > > exactly-once-stream-processing-with-apache-flink/>. > > > They also shared the code <https://github.com/dataArtisans/performance > >. > > I > > > didn't dig in much, but they did a wide range of algorithms. They have > > the > > > native code, so you write the Beam code and check against the native > > > performance. > > > > > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari > > > <amirto...@yahoo.com.invalid> > > > wrote: > > > > > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) > > under > > > > Beam.I can share my experience. Can you list items of interest to > know > > > so I > > > > can answer them to the best of my knowledge.Cheers > > > > > > > > From: Jason Kuster <jasonkus...@google.com.INVALID> > > > > To: dev@beam.incubator.apache.org > > > > Sent: Monday, October 17, 2016 5:06 PM > > > > Subject: Exploring Performance Testing > > > > > > > > Hey all, > > > > > > > > Now that we've covered some of the initial ground with regard to > > > > correctness testing, I'm going to be starting work on performance > > testing > > > > and benchmarking. I wanted to reach out and see what people's > > experiences > > > > have been with performance testing and benchmarking > > > > frameworks, particularly in other Apache projects. Anyone have any > > > > experience or thoughts? > > > > > > > > Best, > > > > > > > > Jason > > > > > > > > -- > > > > ------- > > > > Jason Kuster > > > > Apache Beam (Incubating) / Google Cloud Dataflow > > > > > > > > > > > > > > > > > >