I'm happy to help out in the places I'm more familiar with some starting next week. I suspect things probably just fell to the wayside for some folks during the 2.2.0 release.
On Thu, Jul 13, 2017 at 1:16 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > Hi all, > > > Another gentle ping for help. > > Probably, let me open up a JIRA and proceed this after a couple of weeks > if no one is going to do this although I hope someone takes this. > > > Thanks. > > 2017-06-18 2:16 GMT+09:00 Sean Owen <so...@cloudera.com>: > >> Looks like a whole lot of the results have been analyzed. I suspect >> there's more than enough to act on already. I think we should wait until >> after 2.2 is done. >> Anybody prefer how to proceed here -- just open a JIRA to take care of a >> batch of related types of issues and go for it? >> >> On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> Gentle ping to dev for help. I hope this effort is not abandoned. >>> >>> >>> On 25 May 2017 9:41 am, "Josh Rosen" <joshro...@databricks.com> wrote: >>> >>> I'm interested in using the Scapegoat >>> <https://github.com/sksamuel/scapegoat> Scala compiler plugin to find >>> potential bugs and performance problems in Spark. Scapegoat has a useful >>> built-in set of inspections and is pretty easy to extend with custom ones. >>> For example, I added an inspection to spot places where we call >>> *.apply()* on a Seq which is not an IndexedSeq >>> <https://github.com/sksamuel/scapegoat/pull/159> in order to make it >>> easier to spot potential O(n^2) performance bugs. >>> >>> There are lots of false-positives and benign warnings (as with any >>> linter / static analyzer) so I don't think it's feasible to us to include >>> this as a blocking step in our regular build. I am planning to build >>> tooling to surface only new warnings so going forward this can become a >>> useful code-review aid. >>> >>> The current codebase has roughly 1700 warnings that I would like to >>> triage and categorize as false-positives or real bugs. I can't do this >>> alone, so here's how you can help: >>> >>> - Visit the Google Docs spreadsheet at https://docs.google.com/spread >>> sheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit? >>> usp=sharing >>> >>> <https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=sharing> >>> and >>> find an un-triaged warning. >>> - In the columns at the right of the sheet, enter your name in the >>> appropriate column to mark a warning as a false-positive or as a real bug >>> and/or performance issue. If think a warning is a real issue then use the >>> "comments" column for providing additional detail. >>> - Please don't file JIRAs or PRs for individual warnings; I suspect >>> that we'll find clusters of issues which are best fixed in a few larger >>> PRs >>> vs. lots of smaller ones. Certain warnings are probably simply style >>> issues >>> so we should discuss those before trying to fix them. >>> >>> The sheet has hidden columns capturing the Spark revision and Scapegoat >>> revision. I can use this to programmatically update the sheet and remap >>> lines after updating either Scapegoat (to suppress false-positives) or >>> Spark (to incorporate fixes and surface new warnings). For those who are >>> interested, the sheet was produced with this script: >>> https://gist.github.com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8 >>> >>> Depending on the results of this experiment we might want to integrate a >>> high-signal subset of the Scapegoat warnings into our build. I'm also >>> hoping that we'll be able to build a useful corpus of triaged warnings in >>> order to help improve Scapegoat itself and eliminate common false-positives. >>> >>> Thanks and happy bug-hunting, >>> Josh Rosen >>> >>> >>> > -- Cell : 425-233-8271 Twitter: https://twitter.com/holdenkarau