Re: Benchmarking dashboard proposal

Tanya Schlusser Mon, 04 Feb 2019 10:38:05 -0800

I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL along
with a README in a new directory `arrow/dev/benchmarking` unless directed
otherwise.


A "C++ Benchmark Collector" script would be super. I expect some
back-and-forth on this to identify naïve assumptions in the data model.

Attempting to submit actual benchmarks is how to get a handle on that. I
recognize I'm blocking downstream work. Better to get an initial PR and
some discussion going.

Best,
Tanya

On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <wesmck...@gmail.com> wrote:

> hi folks,
>
> I'm curious where we currently stand on this project. I see the
> discussion in https://issues.apache.org/jira/browse/ARROW-4313 --
> would the next step be to have a pull request with .sql files
> containing the DDL required to create the schema in PostgreSQL?
>
> I could volunteer to write the "C++ Benchmark Collector" script that
> will run all the benchmarks on Linux and collect their data to be
> inserted into the database.
>
> Thanks
> Wes
>
> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net> wrote:
> >
> > I don't want to be the bottleneck and have posted an initial draft data
> > model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313
> >
> > It should not be a problem to get content into a form that would be
> > acceptable for either a static site like ASV (via CORS queries to a
> > GraphQL/REST interface) or a codespeed-style site (via a separate schema
> > organized for Django)
> >
> > I don't think I'm experienced enough to actually write any benchmarks
> > though, so all I can contribute is backend work for this task.
> >
> > Best,
> > Tanya
> >
> > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > hi folks,
> > >
> > > I'd like to propose some kind of timeline for getting a first
> > > iteration of a benchmark database developed and live, with scripts to
> > > enable one or more initial agents to start adding new data on a daily
> > > / per-commit basis. I have at least 3 physical machines where I could
> > > immediately set up cron jobs to start adding new data, and I could
> > > attempt to backfill data as far back as possible.
> > >
> > > Personally, I would like to see this done by the end of February if
> > > not sooner -- if we don't have the volunteers to push the work to
> > > completion by then please let me know as I will rearrange my
> > > priorities to make sure that it happens. Does that sounds reasonable?
> > >
> > > Please let me know if this plan sounds reasonable:
> > >
> > > * Set up a hosted PostgreSQL instance, configure backups
> > > * Propose and adopt a database schema for storing benchmark results
> > > * For C++, write script (or Dockerfile) to execute all
> > > google-benchmarks, output results to JSON, then adapter script
> > > (Python) to ingest into database
> > > * For Python, similar script that invokes ASV, then inserts ASV
> > > results into benchmark database
> > >
> > > This seems to be a pre-requisite for having a front-end to visualize
> > > the results, but the dashboard/front end can hopefully be implemented
> > > in such a way that the details of the benchmark database are not too
> > > tightly coupled
> > >
> > > (Do we have any other benchmarks in the project that would need to be
> > > inserted initially?)
> > >
> > > Related work to trigger benchmarks on agents when new commits land in
> > > master can happen concurrently -- one task need not block the other
> > >
> > > Thanks
> > > Wes
> > >
> > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > >
> > > > Sorry, copy-paste failure:
> > > https://issues.apache.org/jira/browse/ARROW-4313
> > > >
> > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <wesmck...@gmail.com>
> > > wrote:
> > > > >
> > > > > I don't think there is one but I just created
> > > > >
> > >
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> > > > >
> > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net
> >
> > > wrote:
> > > > > >
> > > > > > Areg,
> > > > > >
> > > > > > If you'd like help, I volunteer! No experience benchmarking but
> tons
> > > > > > experience databasing—I can mock the backend (database + http)
> as a
> > > > > > starting point for discussion if this is the way people want to
> go.
> > > > > >
> > > > > > Is there a Jira ticket for this that i can jump into?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
> wesmck...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > hi Areg,
> > > > > > >
> > > > > > > This sounds great -- we've discussed building a more
> full-featured
> > > > > > > benchmark automation system in the past but nothing has been
> > > developed
> > > > > > > yet.
> > > > > > >
> > > > > > > Your proposal about the details sounds OK; the single most
> > > important
> > > > > > > thing to me is that we build and maintain a very general
> purpose
> > > > > > > database schema for building the historical benchmark database
> > > > > > >
> > > > > > > The benchmark database should keep track of:
> > > > > > >
> > > > > > > * Timestamp of benchmark run
> > > > > > > * Git commit hash of codebase
> > > > > > > * Machine unique name (sort of the "user id")
> > > > > > > * CPU identification for machine, and clock frequency (in case
> of
> > > > > > > overclocking)
> > > > > > > * CPU cache sizes (L1/L2/L3)
> > > > > > > * Whether or not CPU throttling is enabled (if it can be easily
> > > determined)
> > > > > > > * RAM size
> > > > > > > * GPU identification (if any)
> > > > > > > * Benchmark unique name
> > > > > > > * Programming language(s) associated with benchmark (e.g. a
> > > benchmark
> > > > > > > may involve both C++ and Python)
> > > > > > > * Benchmark time, plus mean and standard deviation if
> available,
> > > else NULL
> > > > > > >
> > > > > > > (maybe some other things)
> > > > > > >
> > > > > > > I would rather not be locked into the internal database schema
> of a
> > > > > > > particular benchmarking tool. So people in the community can
> just
> > > run
> > > > > > > SQL queries against the database and use the data however they
> > > like.
> > > > > > > We'll just have to be careful that people don't DROP TABLE or
> > > DELETE
> > > > > > > (but we should have daily backups so we can recover from such
> > > cases)
> > > > > > >
> > > > > > > So while we may make use of TeamCity to schedule the runs on
> the
> > > cloud
> > > > > > > and physical hardware, we should also provide a path for other
> > > people
> > > > > > > in the community to add data to the benchmark database on their
> > > > > > > hardware on an ad hoc basis. For example, I have several
> machines
> > > in
> > > > > > > my home on all operating systems (Windows / macOS / Linux, and
> soon
> > > > > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs
> to
> > > > > > > report in to the database at least on a daily basis.
> > > > > > >
> > > > > > > Ideally the benchmark database would just be a PostgreSQL
> server
> > > with
> > > > > > > a schema we write down and keep backed up etc. Hosted
> PostgreSQL is
> > > > > > > inexpensive ($200+ per year depending on size of instance; this
> > > > > > > probably doesn't need to be a crazy big machine)
> > > > > > >
> > > > > > > I suspect there will be a manageable amount of development
> > > involved to
> > > > > > > glue each of the benchmarking frameworks together with the
> > > benchmark
> > > > > > > database. This can also handle querying the operating system
> for
> > > the
> > > > > > > system information listed above
> > > > > > >
> > > > > > > Thanks
> > > > > > > Wes
> > > > > > >
> > > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > > > > > > <areg.melik-adam...@intel.com> wrote:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I want to restart/attach to the discussions for creating
> Arrow
> > > > > > > benchmarking dashboard. I want to propose performance benchmark
> > > run per
> > > > > > > commit to track the changes.
> > > > > > > > The proposal includes building infrastructure for per-commit
> > > tracking
> > > > > > > comprising of the following parts:
> > > > > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/
> as a
> > > build
> > > > > > > system
> > > > > > > > - Agents running in cloud both VM/container (DigitalOcean, or
> > > others)
> > > > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > > > > > > - JFrog artifactory storage and management for OSS projects
> > > > > > > https://jfrog.com/open-source/#artifactory2
> > > > > > > > - Codespeed as a frontend
> https://github.com/tobami/codespeed
> > > > > > > >
> > > > > > > > I am volunteering to build such system (if needed more Intel
> > > folks will
> > > > > > > be involved) so we can start tracking performance on various
> > > platforms and
> > > > > > > understand how changes affect it.
> > > > > > > >
> > > > > > > > Please, let me know your thoughts!
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > -Areg.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > >
>

Re: Benchmarking dashboard proposal

Reply via email to