Re: Benchmarking dashboard proposal
> > Side question: is it expected to be able to connect to the DB directly > from the outside? I don't have any clue about the possible security > implications. This is do-able by creating different database accounts. Also, Wes's solution was to back up the database periodically (daily?) to protect against accidents. The current setup has a root user (full permission), `arrow_anonymous` user (select + insert only), and `arrow_admin` (select, insert, update, delete). On Wed, Feb 20, 2019 at 12:19 PM Antoine Pitrou wrote: > > Side question: is it expected to be able to connect to the DB directly > from the outside? I don't have any clue about the possible security > implications. > > Regards > > Antoine. > > > > Le 20/02/2019 à 18:55, Melik-Adamyan, Areg a écrit : > > There is a lot of discussion going in the PR ARROW-4313 itself; I would > like to bring some of the high-level questions here to discuss. First of > all many thanks to Tanya for the work you are doing. > > Related to the dashboard intrinsics, I would like to set some scope and > stick to that so we would not waste any job and get maximum efficiency from > the work we are doing on the dashboard. > > One thing that IMHO we are missing is against which requirements the > work (DDL) is being done and in which scope? For me there are several > things: > > 1. We want continuous *validated* performance tracking against checkins > to catch performance regressions and progressions. Validated means that the > running environment is isolated enough so the stddev (assuming the > distribution is normal) is as close to 0 as possible. It means both > hardware and software should be fixed and not changeable to have only one > variable to measure. > > 2. The unit-tests framework (google/benchmark) allows to effectively > report in textual format the needed data on benchmark with preamble > containing information about the machine on which the benchmarks are run. > > 3. So with environments set and regular runs you have all the artifacts, > though not in a very comprehensible format. So the reason to set a > dashboard is to allow to consume data and be able to track performance of > various parts on a historical perspective and much more nicely with > visualizations. > > And here are the scope restrictions I have in mind: > > - Disallow to enter data to the central repo any single benchmarks run, > as they do not mean much in the case of continuous and statistically > relevant measurements. What information you will get if someone reports > some single run? You do not know how clean it was done, and more > importantly is it possible to reproduce elsewhere. That is why even if it > is better, worse or the same you cannot compare with the data already in > the DB. > > - Mandate the contributors to have dedicated environment for > measurements. Otherwise they can use the TeamCity to run and parse data and > publish on their site. Data that enters Arrow performance DB becomes Arrow > community owned data. And it becomes community's job to answer why certain > things are better or worse. > > - Because the numbers and flavors for CPU/GPU/accelerators are huge we > cannot satisfy all the needs upfront and create DB that covers all the > possible variants. I think we should have simple CPU and GPU configs now, > even if they will not be perfect. By simple I mean basic brand string. That > should be enough. Having all the detailed info in the DB does not make > sense, as my experience is telling, you never use them, you use the > CPUID/brandname to get the info needed. > > - Scope and reqs will change during the time and going huge now will > make things complicated later. So I think it will be beneficial to have > something quick up and running, get better understanding of our needs and > gaps, and go from there. > > The needed infra is already up on AWS, so as soon as we resolve DNS and > key exchange issues we can launch. > > > > -Areg. > > > > -Original Message- > > From: Tanya Schlusser [mailto:ta...@tickel.net] > > Sent: Thursday, February 7, 2019 4:40 PM > > To: dev@arrow.apache.org > > Subject: Re: Benchmarking dashboard proposal > > > > Late, but there's a PR now with first-draft DDL ( > https://github.com/apache/arrow/pull/3586). > > Happy to receive any feedback! > > > > I tried to think about how people would submit benchmarks, and added a > Postgraphile container for http-via-GraphQL. > > If others have strong opinions on the data modeling please speak up > because I'm more a database user than a designer. > > > > I can also help with benchmarking work in R/Python given guidance/a > roadmap/examples from so
Re: Google Summer of Code 2019 for Apache Arrow
Would developing an open standard for in-memory records qualify as 'GSoC' worthy? In reference to this placeholder in the Confluence wiki: https://cwiki.apache.org/confluence/display/ARROW/Apache+Arrow+Home#ApacheArrowHome-Developinganopenstandardforin-memoryrecords which links to ARROW-1790 https://issues.apache.org/jira/browse/ARROW-1790 and to this thread https://lists.apache.org/thread.html/4818cb3d2ffb4677b24a4279c329fc518a1ac1c9d3017399a4269199@%3Cdev.arrow.apache.org%3E Developing a standard, or even just starting a standard working group would be quite a contribution, and allow a grad student the opportunity to contact multiple leaders in the field. (I am thinking of something along the lines of the Data Mining Group http://dmg.org/, which I believe is run by a local professor here in Chicago). I don't know many people but can ping that professor and maybe some others locally if other people think this seems like a GSoC - worthy project. Best, Tanya On Fri, Feb 1, 2019 at 8:16 AM Wes McKinney wrote: > hi folks, > > We are looking for project ideas and mentors for GSoC 2019. I created a > JIRA > > https://issues.apache.org/jira/browse/COMDEV-309 > > about a couple of project ideas for the C++ library. > > Since Arrow isn't the _easiest_ project to contribute to, we probably > need to calibrate expectations for what a new contributor can > accomplish in a 3 month GSoC project. A good chunk of time at the > beginning will be spent ramping up. If anyone has project ideas (which > need not be in C++) or wants to be a mentor, the deadline is fast > approaching. > > Thanks, > Wes >
Re: Benchmarking dashboard proposal
Late, but there's a PR now with first-draft DDL ( https://github.com/apache/arrow/pull/3586). Happy to receive any feedback! I tried to think about how people would submit benchmarks, and added a Postgraphile container for http-via-GraphQL. If others have strong opinions on the data modeling please speak up because I'm more a database user than a designer. I can also help with benchmarking work in R/Python given guidance/a roadmap/examples from someone else. Best, Tanya On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser wrote: > I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL along > with a README in a new directory `arrow/dev/benchmarking` unless directed > otherwise. > > A "C++ Benchmark Collector" script would be super. I expect some > back-and-forth on this to identify naïve assumptions in the data model. > > Attempting to submit actual benchmarks is how to get a handle on that. I > recognize I'm blocking downstream work. Better to get an initial PR and > some discussion going. > > Best, > Tanya > > On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney wrote: > >> hi folks, >> >> I'm curious where we currently stand on this project. I see the >> discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- >> would the next step be to have a pull request with .sql files >> containing the DDL required to create the schema in PostgreSQL? >> >> I could volunteer to write the "C++ Benchmark Collector" script that >> will run all the benchmarks on Linux and collect their data to be >> inserted into the database. >> >> Thanks >> Wes >> >> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser >> wrote: >> > >> > I don't want to be the bottleneck and have posted an initial draft data >> > model in the JIRA issue >> https://issues.apache.org/jira/browse/ARROW-4313 >> > >> > It should not be a problem to get content into a form that would be >> > acceptable for either a static site like ASV (via CORS queries to a >> > GraphQL/REST interface) or a codespeed-style site (via a separate schema >> > organized for Django) >> > >> > I don't think I'm experienced enough to actually write any benchmarks >> > though, so all I can contribute is backend work for this task. >> > >> > Best, >> > Tanya >> > >> > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney >> wrote: >> > >> > > hi folks, >> > > >> > > I'd like to propose some kind of timeline for getting a first >> > > iteration of a benchmark database developed and live, with scripts to >> > > enable one or more initial agents to start adding new data on a daily >> > > / per-commit basis. I have at least 3 physical machines where I could >> > > immediately set up cron jobs to start adding new data, and I could >> > > attempt to backfill data as far back as possible. >> > > >> > > Personally, I would like to see this done by the end of February if >> > > not sooner -- if we don't have the volunteers to push the work to >> > > completion by then please let me know as I will rearrange my >> > > priorities to make sure that it happens. Does that sounds reasonable? >> > > >> > > Please let me know if this plan sounds reasonable: >> > > >> > > * Set up a hosted PostgreSQL instance, configure backups >> > > * Propose and adopt a database schema for storing benchmark results >> > > * For C++, write script (or Dockerfile) to execute all >> > > google-benchmarks, output results to JSON, then adapter script >> > > (Python) to ingest into database >> > > * For Python, similar script that invokes ASV, then inserts ASV >> > > results into benchmark database >> > > >> > > This seems to be a pre-requisite for having a front-end to visualize >> > > the results, but the dashboard/front end can hopefully be implemented >> > > in such a way that the details of the benchmark database are not too >> > > tightly coupled >> > > >> > > (Do we have any other benchmarks in the project that would need to be >> > > inserted initially?) >> > > >> > > Related work to trigger benchmarks on agents when new commits land in >> > > master can happen concurrently -- one task need not block the other >> > > >> > > Thanks >> > > Wes >> > > >> > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney >>
Re: Benchmarking dashboard proposal
I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL along with a README in a new directory `arrow/dev/benchmarking` unless directed otherwise. A "C++ Benchmark Collector" script would be super. I expect some back-and-forth on this to identify naïve assumptions in the data model. Attempting to submit actual benchmarks is how to get a handle on that. I recognize I'm blocking downstream work. Better to get an initial PR and some discussion going. Best, Tanya On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney wrote: > hi folks, > > I'm curious where we currently stand on this project. I see the > discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- > would the next step be to have a pull request with .sql files > containing the DDL required to create the schema in PostgreSQL? > > I could volunteer to write the "C++ Benchmark Collector" script that > will run all the benchmarks on Linux and collect their data to be > inserted into the database. > > Thanks > Wes > > On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser wrote: > > > > I don't want to be the bottleneck and have posted an initial draft data > > model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313 > > > > It should not be a problem to get content into a form that would be > > acceptable for either a static site like ASV (via CORS queries to a > > GraphQL/REST interface) or a codespeed-style site (via a separate schema > > organized for Django) > > > > I don't think I'm experienced enough to actually write any benchmarks > > though, so all I can contribute is backend work for this task. > > > > Best, > > Tanya > > > > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney > wrote: > > > > > hi folks, > > > > > > I'd like to propose some kind of timeline for getting a first > > > iteration of a benchmark database developed and live, with scripts to > > > enable one or more initial agents to start adding new data on a daily > > > / per-commit basis. I have at least 3 physical machines where I could > > > immediately set up cron jobs to start adding new data, and I could > > > attempt to backfill data as far back as possible. > > > > > > Personally, I would like to see this done by the end of February if > > > not sooner -- if we don't have the volunteers to push the work to > > > completion by then please let me know as I will rearrange my > > > priorities to make sure that it happens. Does that sounds reasonable? > > > > > > Please let me know if this plan sounds reasonable: > > > > > > * Set up a hosted PostgreSQL instance, configure backups > > > * Propose and adopt a database schema for storing benchmark results > > > * For C++, write script (or Dockerfile) to execute all > > > google-benchmarks, output results to JSON, then adapter script > > > (Python) to ingest into database > > > * For Python, similar script that invokes ASV, then inserts ASV > > > results into benchmark database > > > > > > This seems to be a pre-requisite for having a front-end to visualize > > > the results, but the dashboard/front end can hopefully be implemented > > > in such a way that the details of the benchmark database are not too > > > tightly coupled > > > > > > (Do we have any other benchmarks in the project that would need to be > > > inserted initially?) > > > > > > Related work to trigger benchmarks on agents when new commits land in > > > master can happen concurrently -- one task need not block the other > > > > > > Thanks > > > Wes > > > > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney > wrote: > > > > > > > > Sorry, copy-paste failure: > > > https://issues.apache.org/jira/browse/ARROW-4313 > > > > > > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney > > > wrote: > > > > > > > > > > I don't think there is one but I just created > > > > > > > > > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E > > > > > > > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser > > > > wrote: > > > > > > > > > > > > Areg, > > > > > > > > > > > > If you'd like help, I volunteer! No experience benchmarking but > tons > > > > > > experience databasing—I can mock the backend (database + ht
[jira] [Created] (ARROW-4429) Add git rebase tips to the 'Contributing' page in the developer docs
Tanya Schlusser created ARROW-4429: -- Summary: Add git rebase tips to the 'Contributing' page in the developer docs Key: ARROW-4429 URL: https://issues.apache.org/jira/browse/ARROW-4429 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Tanya Schlusser A recent discussion on the listserv (link below) asked about how contributors should handle rebasing. It would be helpful if the tips made it into the developer documentation somehow. I suggest in the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] page—currently a wiki, but hopefully eventually part of the Sphinx docs ARROW-4427. Here is the relevant thread: [https://lists.apache.org/thread.html/c74d8027184550b8d9041e3f2414b517ffb76ccbc1d5aa4563d364b6@%3Cdev.arrow.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4427) Move "Contributing to Apache Arrow" page to the static docs
Tanya Schlusser created ARROW-4427: -- Summary: Move "Contributing to Apache Arrow" page to the static docs Key: ARROW-4427 URL: https://issues.apache.org/jira/browse/ARROW-4427 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Tanya Schlusser It's hard to find and modify the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] wiki page in Confluence. If it were moved to inside the static web page, that would make it easier. There are two steps to this: # Copy the wiki page contents to a new web page at the top "site" level (under arrow/site/ just like the [committers page|https://github.com/apache/arrow/blob/master/site/committers.html]) Maybe named "contributing.html" or something. # Modify the [navigation section in arrow/site/_includes/header.html|https://github.com/apache/arrow/blob/8e195327149b670de2cd7a8cfe75bbd6f71c6b49/site/_includes/header.html#L33] to point to the newly created page instead of the wiki page. The affected pages are all part of the Jekyll components, so there isn't a need to build the Sphinx part of the docs to check your work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4425) Add link to 'Contributing' page in the top-level Arrow README
Tanya Schlusser created ARROW-4425: -- Summary: Add link to 'Contributing' page in the top-level Arrow README Key: ARROW-4425 URL: https://issues.apache.org/jira/browse/ARROW-4425 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Tanya Schlusser It would be nice to add a link to the ["Contributing to Apache Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] Confluence page directly from the main project [README|https://github.com/apache/arrow/blob/master/README.md] (in the already existing "Getting involved" section) because it's a bit hard to find right now. "contributing" page: [https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow] main project README: [https://github.com/apache/arrow/blob/master/README.md] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Git workflow question
This information might be useful to put on the 'contributing' page: https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow I attempted to add it but don't have permission. It was one of my stumbling points too and I'm thankful someone else asked about it. On Wed, Jan 30, 2019 at 12:00 AM Ravindra Pindikura wrote: > > > > > On Jan 30, 2019, at 11:05 AM, Andy Grove wrote: > > > > Got it. Thanks for the clarification. > > > > On Tue, Jan 29, 2019 at 10:30 PM Wes McKinney > wrote: > > > >> hi Andy, > >> > >> yes, in this project I recommend never using "git merge". Merge > >> commits just make branches harder to maintain when master is not using > >> "merge" for merging patches. > >> > >> It is semantically simpler in the case of conflicts with master to use > >> "git rebase -i" to combine your changes into a single commit, then > >> "git rebase master" and resolve the conflicts then. > > Here’s the workflow that I use : > > git fetch upstream > git log -> count my local commits, and remember it as ‘X' > git rebase -i HEAD~x > git rebase upstream/master > git push -f > > > I’m not able to avoid the ‘-f’ in the last step. But, Wes had recommended > that we avoid the force option. Is there a better way to do this ? > > Thanks & regards, > Ravindra, > > >> > >> A linear commit history, with all patches landing in master as single > >> commits, significantly eases downstream users who may be cherry > >> picking fixes into maintenance branches. The alternative -- trying to > >> sift the changes you want out of a tangled web of merge commits -- > >> would be utter madness. > >> > >> - Wes > >> > >> On Tue, Jan 29, 2019 at 11:20 PM Andy Grove > wrote: > >>> > >>> I've been struggling a bit with the workflow and I think I see what I'm > >>> doing wrong now but wanted to confirm. > >>> > >>> I've been running the following to keep my fork up to date: > >>> > >>> git checkout master > >>> git fetch upstream > >>> git merge upstream/master > >>> git push origin > >>> > >>> And then to update my branch I have been doing: > >>> > >>> git checkout ARROW- > >>> git merge master > >>> git push origin > >>> > >>> This generally has worked but sometimes I seem to pick up random > commits > >> on > >>> my branch. > >>> > >>> Reading the github fork workflow docs again it looks like I should have > >>> been running "git rebase master" instead of "git merge master" ? > >>> > >>> Is that the only mistake I'm making? > >>> > >>> Thanks, > >>> > >>> Andy. > >> > >
Re: Benchmarking dashboard proposal
I don't want to be the bottleneck and have posted an initial draft data model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313 It should not be a problem to get content into a form that would be acceptable for either a static site like ASV (via CORS queries to a GraphQL/REST interface) or a codespeed-style site (via a separate schema organized for Django) I don't think I'm experienced enough to actually write any benchmarks though, so all I can contribute is backend work for this task. Best, Tanya On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney wrote: > hi folks, > > I'd like to propose some kind of timeline for getting a first > iteration of a benchmark database developed and live, with scripts to > enable one or more initial agents to start adding new data on a daily > / per-commit basis. I have at least 3 physical machines where I could > immediately set up cron jobs to start adding new data, and I could > attempt to backfill data as far back as possible. > > Personally, I would like to see this done by the end of February if > not sooner -- if we don't have the volunteers to push the work to > completion by then please let me know as I will rearrange my > priorities to make sure that it happens. Does that sounds reasonable? > > Please let me know if this plan sounds reasonable: > > * Set up a hosted PostgreSQL instance, configure backups > * Propose and adopt a database schema for storing benchmark results > * For C++, write script (or Dockerfile) to execute all > google-benchmarks, output results to JSON, then adapter script > (Python) to ingest into database > * For Python, similar script that invokes ASV, then inserts ASV > results into benchmark database > > This seems to be a pre-requisite for having a front-end to visualize > the results, but the dashboard/front end can hopefully be implemented > in such a way that the details of the benchmark database are not too > tightly coupled > > (Do we have any other benchmarks in the project that would need to be > inserted initially?) > > Related work to trigger benchmarks on agents when new commits land in > master can happen concurrently -- one task need not block the other > > Thanks > Wes > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney wrote: > > > > Sorry, copy-paste failure: > https://issues.apache.org/jira/browse/ARROW-4313 > > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney > wrote: > > > > > > I don't think there is one but I just created > > > > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E > > > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser > wrote: > > > > > > > > Areg, > > > > > > > > If you'd like help, I volunteer! No experience benchmarking but tons > > > > experience databasing—I can mock the backend (database + http) as a > > > > starting point for discussion if this is the way people want to go. > > > > > > > > Is there a Jira ticket for this that i can jump into? > > > > > > > > > > > > > > > > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney > wrote: > > > > > > > > > hi Areg, > > > > > > > > > > This sounds great -- we've discussed building a more full-featured > > > > > benchmark automation system in the past but nothing has been > developed > > > > > yet. > > > > > > > > > > Your proposal about the details sounds OK; the single most > important > > > > > thing to me is that we build and maintain a very general purpose > > > > > database schema for building the historical benchmark database > > > > > > > > > > The benchmark database should keep track of: > > > > > > > > > > * Timestamp of benchmark run > > > > > * Git commit hash of codebase > > > > > * Machine unique name (sort of the "user id") > > > > > * CPU identification for machine, and clock frequency (in case of > > > > > overclocking) > > > > > * CPU cache sizes (L1/L2/L3) > > > > > * Whether or not CPU throttling is enabled (if it can be easily > determined) > > > > > * RAM size > > > > > * GPU identification (if any) > > > > > * Benchmark unique name > > > > > * Programming language(s) associated with benchmark (e.g. a > benchmark > > > > > may involve both C++ and Python) > > > > > * Benchma
Re: Benchmarking dashboard proposal
Areg, If you'd like help, I volunteer! No experience benchmarking but tons experience databasing—I can mock the backend (database + http) as a starting point for discussion if this is the way people want to go. Is there a Jira ticket for this that i can jump into? On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney wrote: > hi Areg, > > This sounds great -- we've discussed building a more full-featured > benchmark automation system in the past but nothing has been developed > yet. > > Your proposal about the details sounds OK; the single most important > thing to me is that we build and maintain a very general purpose > database schema for building the historical benchmark database > > The benchmark database should keep track of: > > * Timestamp of benchmark run > * Git commit hash of codebase > * Machine unique name (sort of the "user id") > * CPU identification for machine, and clock frequency (in case of > overclocking) > * CPU cache sizes (L1/L2/L3) > * Whether or not CPU throttling is enabled (if it can be easily determined) > * RAM size > * GPU identification (if any) > * Benchmark unique name > * Programming language(s) associated with benchmark (e.g. a benchmark > may involve both C++ and Python) > * Benchmark time, plus mean and standard deviation if available, else NULL > > (maybe some other things) > > I would rather not be locked into the internal database schema of a > particular benchmarking tool. So people in the community can just run > SQL queries against the database and use the data however they like. > We'll just have to be careful that people don't DROP TABLE or DELETE > (but we should have daily backups so we can recover from such cases) > > So while we may make use of TeamCity to schedule the runs on the cloud > and physical hardware, we should also provide a path for other people > in the community to add data to the benchmark database on their > hardware on an ad hoc basis. For example, I have several machines in > my home on all operating systems (Windows / macOS / Linux, and soon > also ARM64) and I'd like to set up scheduled tasks / cron jobs to > report in to the database at least on a daily basis. > > Ideally the benchmark database would just be a PostgreSQL server with > a schema we write down and keep backed up etc. Hosted PostgreSQL is > inexpensive ($200+ per year depending on size of instance; this > probably doesn't need to be a crazy big machine) > > I suspect there will be a manageable amount of development involved to > glue each of the benchmarking frameworks together with the benchmark > database. This can also handle querying the operating system for the > system information listed above > > Thanks > Wes > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg > wrote: > > > > Hello, > > > > I want to restart/attach to the discussions for creating Arrow > benchmarking dashboard. I want to propose performance benchmark run per > commit to track the changes. > > The proposal includes building infrastructure for per-commit tracking > comprising of the following parts: > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build > system > > - Agents running in cloud both VM/container (DigitalOcean, or others) > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?) > > - JFrog artifactory storage and management for OSS projects > https://jfrog.com/open-source/#artifactory2 > > - Codespeed as a frontend https://github.com/tobami/codespeed > > > > I am volunteering to build such system (if needed more Intel folks will > be involved) so we can start tracking performance on various platforms and > understand how changes affect it. > > > > Please, let me know your thoughts! > > > > Thanks, > > -Areg. > > > > > > >
[jira] [Created] (ARROW-4039) Update link to 'development.rst' page from Python README.md
Tanya Schlusser created ARROW-4039: -- Summary: Update link to 'development.rst' page from Python README.md Key: ARROW-4039 URL: https://issues.apache.org/jira/browse/ARROW-4039 Project: Apache Arrow Issue Type: Task Components: Documentation, Python Reporter: Tanya Schlusser When the Sphinx docs were restructured, the link in the [README|https://github.com/apache/arrow/blob/master/python/README.md] changed from [https://github.com/apache/arrow/blob/master/python/doc/source/development.rst] to [https://github.com/apache/arrow/blob/master/docs/source/python/development.rst] -- This message was sent by Atlassian JIRA (v7.6.3#76005)