[PATCH] D53814: Allow the analyzer to output to a SARIF file

Aaron Ballman via Phabricator via cfe-commits Tue, 30 Oct 2018 06:20:52 -0700

aaron.ballman marked 9 inline comments as done.
aaron.ballman added inline comments.



================
Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
----------------
george.karpenkov wrote:
> aaron.ballman wrote:
> > george.karpenkov wrote:
> > > Would it make more sense to just use `diff` + json pretty-formatter to 
> > > write a test?
> > > With this test I can't even quite figure out how the output should look 
> > > like.
> > I'm not super comfortable with that approach, but perhaps I'm thinking of 
> > something different than what you're proposing. The reason I went with this 
> > approach is because diff would be fragile (depends heavily on field 
> > ordering, which the JSON support library doesn't give control over) and the 
> > physical layout of the file isn't what needs to be tested anyway. SARIF has 
> > a fair amount of optional data that can be provided as well, so using a 
> > purely textual diff worried me that exporting additional optional data in 
> > the future would require extensive unrelated changes to all SARIF diffs in 
> > the test directory.
> > 
> > The goal for this test was to demonstrate that we can validate that the 
> > interesting bits of information are present in the output without worrying 
> > about the details.
> > 
> > Also, the python approach allows us to express relationships between data 
> > items more easily than a textual diff tool would. I've not used that here, 
> > but I could imagine a test where we want to check that each code location 
> > has a corresponding file entry in another list.
> Using a sample file + diff would have an advantage of being easier to read 
> (just glance at the "Inputs/blah.serif" to see a sample output), and 
> consistent with how we already do checking for plists.
> 
> > depends heavily on field ordering
> 
> Is it an issue in practice though? I would assume that JSON support library 
> would not switch field ordering too often (and even if it does, we can have a 
> python wrapper testing that)
> 
> > SARIF has a fair amount of optional data
> 
> Would diff `--ignore-matching-lines` be enough for those?
Diff testing was what I originally tried and I abandoned it because it was not 
viable. The optional data cannot be handled by ignoring matching lines because 
the lines won't match from system to system. For instance, there are absolute 
URIs to files included in the output, clang git hashes and version numbers that 
will change from revision to revision, etc.

What you see as an advantage, I see as more of a distraction -- the actual 
layout of the information in the file isn't that important so long as it 
validates as valid SARIF (unfortunately, there are no cross-platform command 
line tools to validate SARIF yet). What is important are just a few pieces of 
the content information (where are the diagnostics, what source ranges and 
paths are involved, etc) compared to the overall whole.

I can give diff testing another shot, but I was unable to find a way to make 
`-I` work so that I could ignore everything that needs to be ignored due to 
natural variance. Do you have ideas there?

To give you an idea of the ultimate form of the output:
```
{
  "$schema": "http://json.schemastore.org/sarif-2.0.0";,
  "runs": [
    {
      "files": {
        "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c": {
          "fileLocation": {
            "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
          },
          "length": 3850,
          "mimeType": "text/plain",
          "roles": [
            "resultFile"
          ]
        }
      },
      "results": [
        {
          "codeFlows": [
            {
              "threadFlows": [
                {
                  "locations": [
                    {
                      "importance": "essential",
                      "location": {
                        "message": {
                          "text": "Calling 'f'"
                        },
                        "physicalLocation": {
                          "fileLocation": {
                            "uri": 
"file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
                          },
                          "region": {
                            "endColumn": 5,
                            "endLine": 119,
                            "startColumn": 3,
                            "startLine": 119
                          }
                        }
                      },
                      "step": 1
                    },
                    {
                      "importance": "essential",
                      "location": {
                        "message": {
                          "text": "tainted"
                        },
                        "physicalLocation": {
                          "fileLocation": {
                            "uri": 
"file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
                          },
                          "region": {
                            "endColumn": 17,
                            "endLine": 115,
                            "startColumn": 11,
                            "startLine": 115
                          }
                        }
                      },
                      "step": 2
                    }
                  ]
                }
              ]
            }
          ],
          "locations": [
            {
              "physicalLocation": {
                "fileLocation": {
                  "uri": "file:///C:/Users/aballman.GRAMMATECH/Desktop/test.c"
                },
                "region": {
                  "endColumn": 17,
                  "endLine": 115,
                  "startColumn": 11,
                  "startLine": 115
                }
              }
            }
          ],
          "message": {
            "text": "tainted"
          },
          "ruleId": "debug.TaintTest"
        }
      ],
      "tool": {
        "fullName": "clang static analyzer",
        "language": "en-US",
        "name": "clang",
        "version": "clang version 8.0.0 
(https://github.com/llvm-project/clang.git 
a5ccb257a7a70928ede717a7c282f5fc8cbed310) 
(https://github.com/llvm-mirror/llvm.git 
73cebd79c512f7129eca16b0f3a7abd21d2881e8)"
      }
    }
  ],
  "version": "2.0.0-beta.2018-09-26"
}
```
In this file, the variable things are: the file URIs in multiple places, the 
clang version information, and to a lesser extent, the SARIF version 
information (we care that it says 2.0.0 but don't care about the -beta stuff). 
Other pieces of information are likely to become variable in the future (like 
the language field). This log file is ~100 lines long for displaying 
information about one diagnostic with two interesting locations on the code 
path, which is why I think the diff testing is a distraction. I believe that 
more complex examples will result in even larger output files.

I might be able to use FileCheck more directly, e.g.,
```
CHECK: "threadFlows":
CHECK-NEXT: "locations":
CHECK-NEXT: "{"
CHECK-NEXT: "importance": "essential"
...
```
But I feel like that's just a more verbose version of what's being done in the 
patch with less flexibility.



================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:69
+    return std::string(&C, 1);
+  return "%" + llvm::toHex(StringRef(&C, 1));
+}
----------------
george.karpenkov wrote:
> +1, I would use this in other consumers.
Not certain I understand this comment.


================
Comment at: clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h:128
+    /// Used for SARIF output.
+    Sarif,
   };
----------------
george.karpenkov wrote:
> Do you actually need a new generation scheme here?
> I'm pretty sure that using "Minimal" would give you the same effect.
I don't think it's currently needed, I've removed but updated the comment.


https://reviews.llvm.org/D53814



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53814: Allow the analyzer to output to a SARIF file

Reply via email to