Re: [go-nuts] Re: The GitHub Vet Project

2020-12-30 Thread K. Alex Mills
Good point, and thanks for the response.

This example avoids the race condition by breaking out of the loop
immediately afterward. IIRC, loopclosure also reports in case the range
loop variable is passed as an argument using a name that shadows the loop
variable.

I think any loopclosure findings which work properly like this do have to
do so because they mitigate the race condition like these examples do. I
don't think examples like this can be the kind we are (really) looking for:
an instance where the proposed change
 would break code that behaves
correctly today.

On Wed, Dec 30, 2020, 10:03 AM jake...@gmail.com  wrote:

>
> On Tuesday, December 29, 2020 at 1:21:04 PM UTC-5 k.alex...@gmail.com
> wrote:
>
>> I'd like to solicit some input from the community regarding a decision
>> that could reduce the effort required in crowdsourcing.
>>
>> After thinking carefully about the loopclosure vet check
>> ,
>> it seems to me that any time it reports an issue, there actually is an
>> issue. That has also been the case for everything I've seen GitHub Vet
>> report from loopclosure thus far.
>>
>> What loopclosure does is find instances where a range loop variable is
>> used inside a goroutine or defer statement. My understanding is that, in
>> every case, this raises the potential for a race-condition between updating
>> the range loop variable and reading from it within the body of the
>> goroutine. So unless I'm missing something, it seems to me that there is
>> not any need for a human to double-check loopclosure results.
>>
>
> Technically, I believe a range loop variable can be used in a goroutine
> safely. A very contrived example: https://play.golang.org/p/jgZbGg_XP6S .
> In practice I can not see much use for this kind of pattern, but it does
> exist.
>
>
>>
>> So my plan -- unless someone comes up with something I have missed -- is
>> to omit loopclosure findings from the crowdsourcing effort. Once this
>> change is made, loopclosure findings will be kept on disk, but not uploaded
>> to GitHub for crowdsourcing.
>>
>> It's worth mentioning that there are issues which loopclosure misses
>> which VetBot is able to find, and the community's effort will still be
>> needed to help sift through false-positives once the project has stabilized.
>>
>>
>> On Sun, Dec 20, 2020 at 12:08 PM K. Alex Mills 
>> wrote:
>>
>>> Hello Gophers!
>>>
>>> During a Q&A session at this year's GopherCon, some members of the Go
>>> language team indicated a willingness to modify the behavior of range
>>> loops  if we can be
>>> reasonably certain that the change will not cause incorrect behavior in
>>> current programs. To make that determination, a large collection of
>>> real-world go programs would need to be vetted. If we can find that, in
>>> every case, modifying the compiler behavior does not lead to an undesirable
>>> outcome, we will have strong evidence that the change may only have a small
>>> impact.
>>>
>>> I've been working on a project to gather and crowdsource data for that
>>> sort of analysis. It has reached a point where I think that it's time to
>>> share it with the Go community.
>>>
>>> The GitHub Vet Project  performs static
>>> analysis on public Go repositories hosted on GitHub in order to better
>>> understand the impact of this proposed language change
>>> . It consists of two bots.
>>> VetBot crawls GitHub to snippets of code it thinks could be interesting, it
>>> reports it as an issue in a separate repository
>>> , for humans
>>> to analyze via crowdsourcing. TrackBot
>>>  manages
>>> the crowdsourcing workflow.
>>>
>>> At this point, all of the features I think are necessary for the
>>> project's success have been implemented. But I am only one Gopher, and so I
>>> would like to a) make the community aware of the project and b) ask the
>>> community to help review it in three ways:
>>>
>>> 1) Test out TrackBot and the project instructions by actually
>>> crowdsourcing through the issues found so far
>>> .
>>> 2) Review the output
>>>  and
>>> raise concern if anything that I claim ought to be excluded
>>> 
>>> is somehow making it through.
>>> 3) Static analysis experts: review the analyzers
>>>  and
>>> ruthlessly question anything that could lead to a false negative. I don't
>>> think anything exists, but one pair of eyes is often wrong, and there are
>>> many folks on this lis

Re: [go-nuts] Re: The GitHub Vet Project

2020-12-30 Thread 'Tim King' via golang-nuts
This is a very interesting and ambitious experiment! If successful, this is
getting at the type of evidence that would be useful when considering to
change the language.

When evaluating the change, we will need to find cases where the semantics
of the program might differ after the change. Semantic changes include race
conditions being eliminated (like in the Bugs
 section for
the project). It can also include changes to the memory layout after the
loop (one aliased memory allocation vs a memory allocation per iteration
being stored in memory), changes in branching behavior (pointer
comparisons), changes in variable lifetime when an escape does happen
(which can impact when finalizers get run), and order of calls to external
functions. Looking at closures of loop variables and escaping references is
definitely a good start. I am not yet sure exactly what would suffice (but
happy to chat over email).

To make a decision, we probably want to know for each location:
* Is it "semantically equivalent" before/after the change? This can be done
by a combination of automation and manual effort.
* If it differs, was the behavior before clearly "incorrect"/"has "bugs"?
* If it differs, was the behavior after clearly "incorrect"/"has "bugs"?
For the above, a location might not have a clear answer after
expert review. It might be worth documenting this. A few of these map
decently onto the existing categories. Knowing when the number of differing
locations where someone is clearly helped vs differing locations where
someone is clearly hurt is probably the most important in making a decision.

Cases with bugs may want to include a notion of confidence. Examples like
those in the Bugs
 section are
extremely likely to be bugs. Others may not be as clear. ("Is there
actually a race condition here?" can be hard to assess in someone else's
code.) The current phrasing "could lead to undesirable behavior" leaves the
confidence ambiguous. It might be helpful to clarify what this means to
reviewers.

The current wording also currently suggests we can also stop looking once
we have identified "a fix to a bug" without considering downstream
interactions after the loop. For example: `for _, x := range l { defer
func() { x.someMethod() }() }`. That is almost certainly a bug. However, on
the list `{nil, }` the original version avoids a nil
pointer dereference panic by only using the last element. Basically
having two bugs pull in the opposite direction mitigating the other. I have
only really seen this where an iffy unit test was broken by creating a new
copy for each loop. The larger point though is it is not always clear how
much context to look at before labeling an example "clearly buggy before
and would be fixed". Stopping early could be a source of false negatives.
This may be more of a theoretical concern than a real one.

Minor technical points:

For folks developing/commenting on the static analyzers, it would also help
to formalize the notion of "semantic equivalence" that is trying to be
reached. It is likely a refinement relation like one would see in the
program transformation literature. The change will always be safe when the
loop after the change is a refinement of the original loop. (If we also
care about performance, we will also want to know how often
this transformation can be undone to reuse the memory, e.g. the original
refines the modified. But declaring this out of scope is reasonable.)

Technically, I believe a range loop variable can be used in a goroutine
> safely. A very contrived example: https://play.golang.org/p/jgZbGg_XP6S .
> In practice I can not see much use for this kind of pattern, but it does
> exist.
>

It is correct that the range loop variable can be used in a goroutine
safely. However, the existing loopclosure vet check will not report on this
instance. Today it only reports when the go statement is the last statement
in the for loop
.
I have seen non-contrived examples where range loop variables are used by
multiple goroutines. What comes to mind is doing multiple RPCs, with each
RPC in a goroutine on the same range variable and waiting for all of these
to finish before starting the next iteration of the loop. Essentially `for
_, x := range l { someComplexOperationWithGoRoutines(&x) }`.

So my plan -- unless someone comes up with something I have missed -- is to
>> omit loopclosure findings from the crowdsourcing effort. Once this change
>> is made, loopclosure findings will be kept on disk, but not uploaded to
>> GitHub for crowdsourcing.
>>
>
Omitting loopclosure reports from crowdsourcing makes sense to me. It would
also be valuable to track and make available how many cases of these there
are. These would essentially be locations that do differ and automation can
c

Re: [go-nuts] Re: The GitHub Vet Project

2021-01-01 Thread K. Alex Mills
> Technically, I believe a range loop variable can be used in a goroutine
>> safely. A very contrived example: https://play.golang.org/p/jgZbGg_XP6S .
>> In practice I can not see much use for this kind of pattern, but it does
>> exist.
>>
>
> It is correct that the range loop variable can be used in a goroutine
> safely. However, the existing loopclosure vet check will not report on this
> instance.
>

VetBot uses a variant of loopclosure which I modified to be more sensitive
(to avoid false-negatives). The variant in VetBot does report code like
this example

even
though it is a false-positive, due to the break.


> When evaluating the change, we will need to find cases where the semantics
> of the program might differ after the change. Semantic changes include race
> conditions being eliminated (like in the Bugs
>  section
> for the project). It can also include changes to the memory layout after
> the loop (one aliased memory allocation vs a memory allocation per
> iteration being stored in memory),changes in branching behavior (pointer
> comparisons), changes in variable lifetime when an escape does happen
> (which can impact when finalizers get run), and order of calls to
> external functions. Looking at closures of loop variables and escaping
> references is definitely a good start. I am not yet sure exactly what would
> suffice (but happy to chat over email).


I like how this list starts by laying out all the semantic changes in the
proposal rather than starting from asking 'what can go wrong' -- which is
where I started.

Now that we have this list -- I'm looking at each of the entries in this
list and asking myself "what could go wrong, and how can we detect it?"
There are a few cases where I'm not able to come up with a satisfactory
answer on my own. I'm hoping that by laying out my understanding below in a
transparent way, someone might be willing to help fill in the gaps.

race conditions being eliminated (like in the Bugs
>  section
> for the project)
>

It's almost certainly the case that any current code which relies on the
existence of a race condition for its correct operation is flaky. We can
probably safely classify these as 'beneficial' semantic changes, but that
can't be a unilateral decision. If we're willing to accept that any code
that passes a range-loop pointer to a goroutine is going to have its
behavior improved by the change, we can avoid many of the issues being
reported by the prototype.

changes to the memory layout after the loop (one aliased memory allocation
> vs a memory allocation per iteration being stored in memory)
>

It seems to me that this can only break existing code which relies on the
very specific way in which variables are laid out in memory. Based on the
little I know, I would expect this to require the code to do something with
the unsafe package or use CGO. Are there other ways to make code rely on
memory layout in Go (other than things like pointer comparisons and
finalizers mentioned below)?

changes in branching behavior (pointer comparisons)
>
This makes sense and it seems straightforward to check for by finding
instances where a range-loop reference is used in a boolean expression.

changes in variable lifetime when an escape does happen (which can impact
> when finalizers get run)


I'm still processing this one; but after a glance, it looks like we
*might *only
be interested in when a range-loop reference is passed as the first
argument to `runtime.SetFinalizer`. If that's the case, writing an analyzer
should be straightforward.

order of calls to external functions


I must be missing something here, since I am not sure how the proposed
language change can impact the order in which calls to external functions
are made. I don't think that I'm grasping your intended meaning.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CALJzkY-ev_eDunVCuB4wzL499B0heE%3DhuCWbc7zyN7GQPPhTWg%40mail.gmail.com.