hi folks, I'm curious where we currently stand on this project. I see the discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- would the next step be to have a pull request with .sql files containing the DDL required to create the schema in PostgreSQL?
I could volunteer to write the "C++ Benchmark Collector" script that will run all the benchmarks on Linux and collect their data to be inserted into the database. Thanks Wes On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net> wrote: > > I don't want to be the bottleneck and have posted an initial draft data > model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313 > > It should not be a problem to get content into a form that would be > acceptable for either a static site like ASV (via CORS queries to a > GraphQL/REST interface) or a codespeed-style site (via a separate schema > organized for Django) > > I don't think I'm experienced enough to actually write any benchmarks > though, so all I can contribute is backend work for this task. > > Best, > Tanya > > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi folks, > > > > I'd like to propose some kind of timeline for getting a first > > iteration of a benchmark database developed and live, with scripts to > > enable one or more initial agents to start adding new data on a daily > > / per-commit basis. I have at least 3 physical machines where I could > > immediately set up cron jobs to start adding new data, and I could > > attempt to backfill data as far back as possible. > > > > Personally, I would like to see this done by the end of February if > > not sooner -- if we don't have the volunteers to push the work to > > completion by then please let me know as I will rearrange my > > priorities to make sure that it happens. Does that sounds reasonable? > > > > Please let me know if this plan sounds reasonable: > > > > * Set up a hosted PostgreSQL instance, configure backups > > * Propose and adopt a database schema for storing benchmark results > > * For C++, write script (or Dockerfile) to execute all > > google-benchmarks, output results to JSON, then adapter script > > (Python) to ingest into database > > * For Python, similar script that invokes ASV, then inserts ASV > > results into benchmark database > > > > This seems to be a pre-requisite for having a front-end to visualize > > the results, but the dashboard/front end can hopefully be implemented > > in such a way that the details of the benchmark database are not too > > tightly coupled > > > > (Do we have any other benchmarks in the project that would need to be > > inserted initially?) > > > > Related work to trigger benchmarks on agents when new commits land in > > master can happen concurrently -- one task need not block the other > > > > Thanks > > Wes > > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > > Sorry, copy-paste failure: > > https://issues.apache.org/jira/browse/ARROW-4313 > > > > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <wesmck...@gmail.com> > > wrote: > > > > > > > > I don't think there is one but I just created > > > > > > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E > > > > > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net> > > wrote: > > > > > > > > > > Areg, > > > > > > > > > > If you'd like help, I volunteer! No experience benchmarking but tons > > > > > experience databasing—I can mock the backend (database + http) as a > > > > > starting point for discussion if this is the way people want to go. > > > > > > > > > > Is there a Jira ticket for this that i can jump into? > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <wesmck...@gmail.com> > > wrote: > > > > > > > > > > > hi Areg, > > > > > > > > > > > > This sounds great -- we've discussed building a more full-featured > > > > > > benchmark automation system in the past but nothing has been > > developed > > > > > > yet. > > > > > > > > > > > > Your proposal about the details sounds OK; the single most > > important > > > > > > thing to me is that we build and maintain a very general purpose > > > > > > database schema for building the historical benchmark database > > > > > > > > > > > > The benchmark database should keep track of: > > > > > > > > > > > > * Timestamp of benchmark run > > > > > > * Git commit hash of codebase > > > > > > * Machine unique name (sort of the "user id") > > > > > > * CPU identification for machine, and clock frequency (in case of > > > > > > overclocking) > > > > > > * CPU cache sizes (L1/L2/L3) > > > > > > * Whether or not CPU throttling is enabled (if it can be easily > > determined) > > > > > > * RAM size > > > > > > * GPU identification (if any) > > > > > > * Benchmark unique name > > > > > > * Programming language(s) associated with benchmark (e.g. a > > benchmark > > > > > > may involve both C++ and Python) > > > > > > * Benchmark time, plus mean and standard deviation if available, > > else NULL > > > > > > > > > > > > (maybe some other things) > > > > > > > > > > > > I would rather not be locked into the internal database schema of a > > > > > > particular benchmarking tool. So people in the community can just > > run > > > > > > SQL queries against the database and use the data however they > > like. > > > > > > We'll just have to be careful that people don't DROP TABLE or > > DELETE > > > > > > (but we should have daily backups so we can recover from such > > cases) > > > > > > > > > > > > So while we may make use of TeamCity to schedule the runs on the > > cloud > > > > > > and physical hardware, we should also provide a path for other > > people > > > > > > in the community to add data to the benchmark database on their > > > > > > hardware on an ad hoc basis. For example, I have several machines > > in > > > > > > my home on all operating systems (Windows / macOS / Linux, and soon > > > > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs to > > > > > > report in to the database at least on a daily basis. > > > > > > > > > > > > Ideally the benchmark database would just be a PostgreSQL server > > with > > > > > > a schema we write down and keep backed up etc. Hosted PostgreSQL is > > > > > > inexpensive ($200+ per year depending on size of instance; this > > > > > > probably doesn't need to be a crazy big machine) > > > > > > > > > > > > I suspect there will be a manageable amount of development > > involved to > > > > > > glue each of the benchmarking frameworks together with the > > benchmark > > > > > > database. This can also handle querying the operating system for > > the > > > > > > system information listed above > > > > > > > > > > > > Thanks > > > > > > Wes > > > > > > > > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg > > > > > > <areg.melik-adam...@intel.com> wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I want to restart/attach to the discussions for creating Arrow > > > > > > benchmarking dashboard. I want to propose performance benchmark > > run per > > > > > > commit to track the changes. > > > > > > > The proposal includes building infrastructure for per-commit > > tracking > > > > > > comprising of the following parts: > > > > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a > > build > > > > > > system > > > > > > > - Agents running in cloud both VM/container (DigitalOcean, or > > others) > > > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?) > > > > > > > - JFrog artifactory storage and management for OSS projects > > > > > > https://jfrog.com/open-source/#artifactory2 > > > > > > > - Codespeed as a frontend https://github.com/tobami/codespeed > > > > > > > > > > > > > > I am volunteering to build such system (if needed more Intel > > folks will > > > > > > be involved) so we can start tracking performance on various > > platforms and > > > > > > understand how changes affect it. > > > > > > > > > > > > > > Please, let me know your thoughts! > > > > > > > > > > > > > > Thanks, > > > > > > > -Areg. > > > > > > > > > > > > > > > > > > > > > > > > > > > > >