Re: Benchmarking dashboard proposal

Wes McKinney Mon, 21 Jan 2019 09:23:59 -0800

I don't think there is one but I just created
https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E


On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net> wrote:
>
> Areg,
>
> If you'd like help, I volunteer! No experience benchmarking but tons
> experience databasing—I can mock the backend (database + http) as a
> starting point for discussion if this is the way people want to go.
>
> Is there a Jira ticket for this that i can jump into?
>
>
>
>
> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Areg,
> >
> > This sounds great -- we've discussed building a more full-featured
> > benchmark automation system in the past but nothing has been developed
> > yet.
> >
> > Your proposal about the details sounds OK; the single most important
> > thing to me is that we build and maintain a very general purpose
> > database schema for building the historical benchmark database
> >
> > The benchmark database should keep track of:
> >
> > * Timestamp of benchmark run
> > * Git commit hash of codebase
> > * Machine unique name (sort of the "user id")
> > * CPU identification for machine, and clock frequency (in case of
> > overclocking)
> > * CPU cache sizes (L1/L2/L3)
> > * Whether or not CPU throttling is enabled (if it can be easily determined)
> > * RAM size
> > * GPU identification (if any)
> > * Benchmark unique name
> > * Programming language(s) associated with benchmark (e.g. a benchmark
> > may involve both C++ and Python)
> > * Benchmark time, plus mean and standard deviation if available, else NULL
> >
> > (maybe some other things)
> >
> > I would rather not be locked into the internal database schema of a
> > particular benchmarking tool. So people in the community can just run
> > SQL queries against the database and use the data however they like.
> > We'll just have to be careful that people don't DROP TABLE or DELETE
> > (but we should have daily backups so we can recover from such cases)
> >
> > So while we may make use of TeamCity to schedule the runs on the cloud
> > and physical hardware, we should also provide a path for other people
> > in the community to add data to the benchmark database on their
> > hardware on an ad hoc basis. For example, I have several machines in
> > my home on all operating systems (Windows / macOS / Linux, and soon
> > also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> > report in to the database at least on a daily basis.
> >
> > Ideally the benchmark database would just be a PostgreSQL server with
> > a schema we write down and keep backed up etc. Hosted PostgreSQL is
> > inexpensive ($200+ per year depending on size of instance; this
> > probably doesn't need to be a crazy big machine)
> >
> > I suspect there will be a manageable amount of development involved to
> > glue each of the benchmarking frameworks together with the benchmark
> > database. This can also handle querying the operating system for the
> > system information listed above
> >
> > Thanks
> > Wes
> >
> > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > <areg.melik-adam...@intel.com> wrote:
> > >
> > > Hello,
> > >
> > > I want to restart/attach to the discussions for creating Arrow
> > benchmarking dashboard. I want to propose performance benchmark run per
> > commit to track the changes.
> > > The proposal includes building infrastructure for per-commit tracking
> > comprising of the following parts:
> > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> > system
> > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > - JFrog artifactory storage and management for OSS projects
> > https://jfrog.com/open-source/#artifactory2
> > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > >
> > > I am volunteering to build such system (if needed more Intel folks will
> > be involved) so we can start tracking performance on various platforms and
> > understand how changes affect it.
> > >
> > > Please, let me know your thoughts!
> > >
> > > Thanks,
> > > -Areg.
> > >
> > >
> > >
> >

Re: Benchmarking dashboard proposal

Reply via email to