@Jesse how about runners "tracing" the constructed DAG (by Beam) so that
it's clear what the runner actually executed ?

Example:
For the SparkRunner, a ParDo translates to a mapPartitions transformation.

That could provide transparency when debugging/benchmarking pipelines
per-runner.

On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com>
wrote:

> @Dan before starting with Beam, I'd want to know how much performance I've
> giving up by not programming directly to the API.
>
> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin <dhalp...@google.com.invalid
> >
> wrote:
>
> > I think there are lots of excellent one-off performance studies, but I'm
> > not sure how useful that is to Beam.
> >
> > From a test infra point of view, I'm wondering more about tracking of
> > performance over time, identifying regressions, etc.
> >
> > Google has some tools like PerfKit
> > <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> > basically a skin on a database + some scripts to load and query data;
> but I
> > don't love it. Do other Apache projects do public, long-term benchmarking
> > and performance regression testing?
> >
> > Dan
> >
> > On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com>
> > wrote:
> >
> > > I found data Artisan's benchmarking post
> > > <http://data-artisans.com/high-throughput-low-latency-and-
> > > exactly-once-stream-processing-with-apache-flink/>.
> > > They also shared the code <https://github.com/dataArtisans/performance
> >.
> > I
> > > didn't dig in much, but they did a wide range of algorithms. They have
> > the
> > > native code, so you write the Beam code and check against the native
> > > performance.
> > >
> > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> > > <amirto...@yahoo.com.invalid>
> > > wrote:
> > >
> > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> > under
> > > > Beam.I can share my experience. Can you list items of interest to
> know
> > > so I
> > > > can answer them to the best of my knowledge.Cheers
> > > >
> > > >       From: Jason Kuster <jasonkus...@google.com.INVALID>
> > > >  To: dev@beam.incubator.apache.org
> > > >  Sent: Monday, October 17, 2016 5:06 PM
> > > >  Subject: Exploring Performance Testing
> > > >
> > > > Hey all,
> > > >
> > > > Now that we've covered some of the initial ground with regard to
> > > > correctness testing, I'm going to be starting work on performance
> > testing
> > > > and benchmarking. I wanted to reach out and see what people's
> > experiences
> > > > have been with performance testing and benchmarking
> > > > frameworks, particularly in other Apache projects. Anyone have any
> > > > experience or thoughts?
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > --
> > > > -------
> > > > Jason Kuster
> > > > Apache Beam (Incubating) / Google Cloud Dataflow
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to