[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

George Karpenkov via Phabricator via cfe-commits Tue, 28 Nov 2017 06:06:05 -0800

george.karpenkov requested changes to this revision.
george.karpenkov added inline comments.
This revision now requires changes to proceed.



================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:44
 
+CTU_FUNCTION_MAP_FILENAME = 'externalFnMap.txt'
+CTU_TEMP_FNMAP_FOLDER = 'tmpExternalFnMaps'
----------------
What would happen when multiple analyzer runs are launched concurrently in the 
same directory? Would they race on this file?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:64
             if need_analyzer(args.build):
-                run_analyzer_parallel(args)
+                run_analyzer_with_ctu(args)
         else:
----------------
`run_analyzer_with_ctu` is an unfortunate function name, since it may or may 
not launch CTU depending on the passed arguments. Could we just call it 
`run_analyzer`?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:103
 
+def prefix_with(constant, pieces):
+    """ From a sequence create another sequence where every second element
----------------
Actually I would prefer a separate NFC PR just moving this function. This PR is 
already too complicated as it is.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:120
+                  func_map_cmd=args.func_map_cmd)
+        if hasattr(args, 'ctu_phases') and hasattr(args.ctu_phases, 'dir')
+        else CtuConfig(collect=False, analyze=False, dir='', func_map_cmd=''))
----------------
Extensive `hasattr` usage is often a codesmell, and it hinders understanding.
Firstly, shouldn't  `args.ctu_phases.dir` be always available, judging from the 
code in `arguments.py`?
Secondly, in what cases `args.ctu_phases` is not available when we are already 
going down the CTU code path? Shouldn't we throw an error at that point instead 
of creating a default configuration? 
(default-configurations-instead-of-crashing is also another way to introduce 
very subtle and hard-to-catch error modes)


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:128
+    A function map contains the id of a function (mangled name) and the
+    originating source (the corresponding AST file) name."""
+
----------------
Could you also specify what `func_map_lines` is (file? list? of strings?) in 
the docstring? There's specific Sphinx syntax for that. Otherwise it is hard to 
follow.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:138
+        else:
+            mangled_to_asts[mangled_name].add(ast_file)
+
----------------
Could be improved with `defaultdict`:

```
mangled_to_asts = defaultdict(set)
...
   mangled_to_asts[mangled_name].add(ast_file)
```


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:144
+        if len(ast_files) == 1:
+            mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
----------------
Firstly, no need to modify the set in order to get the first element, just use 
`next(iter(ast_files))`.
Secondly, when exactly do conflicting names happen? Is it a bug? Shouldn't the 
tool log such cases?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:145
+            mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
+    return mangled_ast_pairs
----------------
Overall, instead of creating a dictionary with multiple elements, and then 
converting to a list, it's much simper to only add an element to 
`mangled_to_asts` when it is not already mapped to something (probably logging 
a message otherwise), and then just return `mangled_to_asts.items()`


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:160
+    def generate_func_map_lines(fnmap_dir):
+        """ Iterate over all lines of input files in random order. """
+
----------------
Firstly, is `glob.glob` actually random, as the docstring is saying?
If yes, can we make it not to be random (e.g. with `sorted`), as dealing with 
randomized input makes tracking down bugs so much harder?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:168
+
+    def write_global_map(ctudir, arch, mangled_ast_pairs):
+        """ Write (mangled function name, ast file) pairs into final file. """
----------------
If `write_global_map` is a closure, please use the fact that it would capture 
`ctudir` from the outer scope, and remove it from the argument list.
That would make it more obvious that it does not change between invocations.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:189
+            # Remove all temporary files
+            shutil.rmtree(fnmap_dir, ignore_errors=True)
+
----------------
Having an analysis tool remove files is scary, what if (maybe not in this 
revision, but in a future iteration) a bug is introduced, and the tool removes 
user code instead?
Why not just create a temporary directory with `tempfile.mkdtemp`, put all 
temporary files there, and then simply iterate through them?
Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER` 
entirely, and OS would be responsible for cleanup.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:230
+    if ctu_config.collect:
+        shutil.rmtree(ctu_config.dir, ignore_errors=True)
+    if ctu_config.collect and ctu_config.analyze:
----------------
Similarly to the comment above, I would prefer if analysis tool would not 
remove files (and I assume those are not huge ones?)
Can we just use temporary directories?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:241
+        run_analyzer_parallel(args)
+        shutil.rmtree(ctu_config.dir, ignore_errors=True)
+    else:
----------------
Same as the comment above about removing folders. Also it seems like there 
should be a way to remove redundancy in `if collect / remove tree` block 
repeated twice.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:262
+        'ANALYZE_BUILD_FORCE_DEBUG': 'yes' if args.force_debug else '',
+        'ANALYZE_BUILD_CTU': json.dumps(get_ctu_config(args))
     })
----------------
Using JSON-serialization-over-environment-variables is very unorthodox, and 
would give very bizarre error messages if the user would try to customize 
settings (I assume that is the intent, right?)
I think that separating those options into separate ENV variables (if they 
_have_ to be customizable) would be much better.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:296
+        'command': [execution.cmd[0], '-c'] + compilation.flags,
+        'ctu': json.loads(os.getenv('ANALYZE_BUILD_CTU'))
     }
----------------
Again, is it possible to avoid JSON-over-environment-variables?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:534
+        mangled_name = fn_src_txt[0:dpos]
+        path = fn_src_txt[dpos + 1:]
+        # Normalize path on windows as well
----------------
`mangled_name, path = fn_src_txt.split(" ", 1)` ?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:561
+            except OSError:
+                pass
+        ast_command = [opts['clang'], '-emit-ast']
----------------
`try/except/pass` is almost always bad.
When can the error occur? Why are we ignoring it?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:567
+        ast_command.append('-o')
+        ast_command.append(ast_path)
+        logging.debug("Generating AST using '%s'", ast_command)
----------------
The above can be written more succinctly as:
`ast_command = [opts['clang'], ...] + args + ['-w', ...]`


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:578
+        funcmap_command.append('--')
+        funcmap_command.extend(args)
+        logging.debug("Generating function map using '%s'", funcmap_command)
----------------
Similarly here, `funcmap_command` can be generated in one line using `+`


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:588
+            except OSError:
+                pass
+        if func_ast_list:
----------------
Again, why is this error ignored?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:609
+    # Recover namedtuple from json when coming from analyze_cc
+    if not hasattr(ctu_config, 'collect'):
+        ctu_config = CtuConfig(collect=ctu_config[0],
----------------
In which case is this branch hit? Isn't improperly formed input argument 
indicative of an internal error at this stage?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:702
 
+
 # To have good results from static analyzer certain compiler options shall be
----------------
This blank line should not be in this PR.


================
Comment at: tools/scan-build-py/libscanbuild/clang.py:168
+        return False
+    return True
+
----------------
I might be missing something here, but why is the ability to call `--version` 
indicative of CTU support?
At worst, this can lead to obscuring real bugs: imagine if the user has 
`args.clang` pointed to broken/non-existent binary, then `is_ctu_capable` would 
simply return `False` (hiding the original error!), which would show a 
completely misleading error message.

Just checking `func_map_cmd` seems better, but even in this case we should 
probably log any errors occurring on `-version` call (such messages would 
really aid debugging)


================
Comment at: tools/scan-build-py/libscanbuild/clang.py:177
+    arch = ""
+    i = 0
+    while i < len(cmd) and cmd[i] != "-triple":
----------------
Seconded, would prefer this rewritten using `separator = cmd.find('-triple')`


================
Comment at: tools/scan-build-py/libscanbuild/report.py:270
 
+    # Protect parsers from bogus report files coming from clang crashes
+    for filename in glob.iglob(os.path.join(output_dir, pattern)):
----------------
I understand the intent here, but it seems it should be handled at a different 
level: would it be hard to change Clang to only write the report file at the 
very end, when no crash should be encountered? Or make parsers not choke on 
empty fields?


================
Comment at: tools/scan-build-py/tests/unit/test_analyze.py:338
+
+class PrefixWithTest(unittest.TestCase):
+
----------------
Probably more tests are required for almost 400 lines of functional Python code 
in this PR.
Would it be hard to have a full LIT-style integration test? E.g. have a dummy 
script emulating Clang with a dummy directory structure, which would show how 
all pieces are meant to fit together?


https://reviews.llvm.org/D30691



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

Reply via email to