I'm happy to help out in the places I'm more familiar with some starting
next week. I suspect things probably just fell to the wayside for some
folks during the 2.2.0 release.

On Thu, Jul 13, 2017 at 1:16 AM, Hyukjin Kwon <gurwls...@gmail.com> wrote:

> Hi all,
>
>
> Another gentle ping for help.
>
> Probably, let me open up a JIRA and proceed this after a couple of weeks
> if no one is going to do this although I hope someone takes this.
>
>
> Thanks.
>
> 2017-06-18 2:16 GMT+09:00 Sean Owen <so...@cloudera.com>:
>
>> Looks like a whole lot of the results have been analyzed. I suspect
>> there's more than enough to act on already. I think we should wait until
>> after 2.2 is done.
>> Anybody prefer how to proceed here -- just open a JIRA to take care of a
>> batch of related types of issues and go for it?
>>
>> On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>
>>> Gentle ping to dev for help. I hope this effort is not abandoned.
>>>
>>>
>>> On 25 May 2017 9:41 am, "Josh Rosen" <joshro...@databricks.com> wrote:
>>>
>>> I'm interested in using the Scapegoat
>>> <https://github.com/sksamuel/scapegoat> Scala compiler plugin to find
>>> potential bugs and performance problems in Spark. Scapegoat has a useful
>>> built-in set of inspections and is pretty easy to extend with custom ones.
>>> For example, I added an inspection to spot places where we call
>>> *.apply()* on a Seq which is not an IndexedSeq
>>> <https://github.com/sksamuel/scapegoat/pull/159> in order to make it
>>> easier to spot potential O(n^2) performance bugs.
>>>
>>> There are lots of false-positives and benign warnings (as with any
>>> linter / static analyzer) so I don't think it's feasible to us to include
>>> this as a blocking step in our regular build. I am planning to build
>>> tooling to surface only new warnings so going forward this can become a
>>> useful code-review aid.
>>>
>>> The current codebase has roughly 1700 warnings that I would like to
>>> triage and categorize as false-positives or real bugs. I can't do this
>>> alone, so here's how you can help:
>>>
>>>    - Visit the Google Docs spreadsheet at https://docs.google.com/spread
>>>    sheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?
>>>    usp=sharing
>>>    
>>> <https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=sharing>
>>>  and
>>>    find an un-triaged warning.
>>>    - In the columns at the right of the sheet, enter your name in the
>>>    appropriate column to mark a warning as a false-positive or as a real bug
>>>    and/or performance issue. If think a warning is a real issue then use the
>>>    "comments" column for providing additional detail.
>>>    - Please don't file JIRAs or PRs for individual warnings; I suspect
>>>    that we'll find clusters of issues which are best fixed in a few larger 
>>> PRs
>>>    vs. lots of smaller ones. Certain warnings are probably simply style 
>>> issues
>>>    so we should discuss those before trying to fix them.
>>>
>>> The sheet has hidden columns capturing the Spark revision and Scapegoat
>>> revision. I can use this to programmatically update the sheet and remap
>>> lines after updating either Scapegoat (to suppress false-positives) or
>>> Spark (to incorporate fixes and surface new warnings). For those who are
>>> interested, the sheet was produced with this script:
>>> https://gist.github.com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8
>>>
>>> Depending on the results of this experiment we might want to integrate a
>>> high-signal subset of the Scapegoat warnings into our build. I'm also
>>> hoping that we'll be able to build a useful corpus of triaged warnings in
>>> order to help improve Scapegoat itself and eliminate common false-positives.
>>>
>>> Thanks and happy bug-hunting,
>>> Josh Rosen
>>>
>>>
>>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Reply via email to