[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

George Karpenkov via Phabricator via cfe-commits Thu, 07 Dec 2017 08:01:37 -0800

george.karpenkov added inline comments.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:44
 
+CTU_FUNCTION_MAP_FILENAME = 'externalFnMap.txt'
+CTU_TEMP_FNMAP_FOLDER = 'tmpExternalFnMaps'
----------------
gerazo wrote:
> george.karpenkov wrote:
> > What would happen when multiple analyzer runs are launched concurrently in 
> > the same directory? Would they race on this file?
> Yes and no. The 1st, collect part of CTU creates this file by aggregating all 
> data from the build system, while the 2nd part which does the analysis itself 
> only reads it. Multiple analysis can use this file simultaneously without 
> problem. However, multiple collect phases launched on a system does not make 
> sense. In this case, the later one would write over the previous results with 
> the same data.
> However, multiple collect phases launched on a system does not make sense

Why not? What about a system with multiple users, where code is in the shared 
directory? Or even simpler: I've launched a background process, forgot about 
it, and then launched it again?

> In this case, the later one would write over the previous results with the 
> same data.

That is probably fine, I am more worried about racing, where process B would be 
reading a partially overriden file (not completely sure whether it is possible)



================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:103
 
+def prefix_with(constant, pieces):
+    """ From a sequence create another sequence where every second element
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Actually I would prefer a separate NFC PR just moving this function. This 
> > PR is already too complicated as it is.
> Yes. The only reason was for the move to make it testable. However, we need 
> more tests as you wrote down below.
Sure, but that separate PR can also include tests.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:120
+                  func_map_cmd=args.func_map_cmd)
+        if hasattr(args, 'ctu_phases') and hasattr(args.ctu_phases, 'dir')
+        else CtuConfig(collect=False, analyze=False, dir='', func_map_cmd=''))
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Extensive `hasattr` usage is often a codesmell, and it hinders 
> > understanding.
> > Firstly, shouldn't  `args.ctu_phases.dir` be always available, judging from 
> > the code in `arguments.py`?
> > Secondly, in what cases `args.ctu_phases` is not available when we are 
> > already going down the CTU code path? Shouldn't we throw an error at that 
> > point instead of creating a default configuration? 
> > (default-configurations-instead-of-crashing is also another way to 
> > introduce very subtle and hard-to-catch error modes)
> It definitely needs more comments, so thanks, I will put them here.
> There are two separate possibilities here for this code part:
> 1. The user does not want CTU at all. In this case, no CTU phases are 
> switched on (collect and analyze), so no CTU code will run. This is why dir 
> has no importance in this case.
> 2. CTU capabilities were not even detected, so the help and available 
> arguments about CTU are also missing. In this case we create a dummy config 
> telling that everything is switched off.
> 
> The reason for using hasattr was that not having CTU capabilities is not an 
> exceptional condition, rather a matter of configuration. I felt that handling 
> this with an exception would be misleading.
Right, but instead of doing (1) and (2) can we have a separate (maybe hidden 
from user) param called e.g. `ctu_enabled` which would explicitly communicate 
that?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:144
+        if len(ast_files) == 1:
+            mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Firstly, no need to modify the set in order to get the first element, just 
> > use `next(iter(ast_files))`.
> > Secondly, when exactly do conflicting names happen? Is it a bug? Shouldn't 
> > the tool log such cases?
> Nice catch, thanks.
> For your second question: Unfortunately, it is not a bug, rather a misfeature 
> which we can't handle yet. There can be tricky build-systems where a single 
> file with different configurations is built multiple times, so the same 
> function signature will be present in multiple link units. The other option 
> is when two separate compile units have an exact same function signature, but 
> they are never linked together, so it is not a problem in the build itself. 
> In this case, the cross compilation unit functionality cannot tell exactly 
> which compilation unit to turn to for the function, because there are 
> multiple valid choices. In this case, we deliberately leave such functions 
> out to avoid potential problems. It deserves a comment I think.
> In this case, we deliberately leave such functions out to avoid potential 
> problems

We probably want to log it, don't we?


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:145
+            mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
+    return mangled_ast_pairs
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Overall, instead of creating a dictionary with multiple elements, and then 
> > converting to a list, it's much simper to only add an element to 
> > `mangled_to_asts` when it is not already mapped to something (probably 
> > logging a message otherwise), and then just return `mangled_to_asts.items()`
> The reason for the previous is that we need to count the occurence number 
> ofdifferent mappings only let those pass through which don't have multiple 
> variations.
ah, OK!


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:189
+            # Remove all temporary files
+            shutil.rmtree(fnmap_dir, ignore_errors=True)
+
----------------
gerazo wrote:
> gerazo wrote:
> > george.karpenkov wrote:
> > > Having an analysis tool remove files is scary, what if (maybe not in this 
> > > revision, but in a future iteration) a bug is introduced, and the tool 
> > > removes user code instead?
> > > Why not just create a temporary directory with `tempfile.mkdtemp`, put 
> > > all temporary files there, and then simply iterate through them?
> > > Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER` 
> > > entirely, and OS would be responsible for cleanup.
> > Yes, you are right. We are essentially using a temp dir. Because of the 
> > size we first had to put it next to the project (not on tmp drive for 
> > instance) and for debugging purposes we gave a name to it. Still it can be 
> > done with mkdtemp as well.
> Finally, I came to the conclusion that mkdtemp would not be better than the 
> current solution. In order to find our created dir by other threads, we need 
> a designated name. Suffixing it by generated name would further complicate 
> things as we need not to allow multiple concurrent runs here. The current 
> solution is more robust from this point of view.
OK


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:230
+    if ctu_config.collect:
+        shutil.rmtree(ctu_config.dir, ignore_errors=True)
+    if ctu_config.collect and ctu_config.analyze:
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Similarly to the comment above, I would prefer if analysis tool would not 
> > remove files (and I assume those are not huge ones?)
> > Can we just use temporary directories?
> Unlike above, here we do remove non-temporary data intentionally. The user 
> asks here to do the recollection of CTU data for a fresh start. Because there 
> is no "clean" functionality in the analyzer interface itself, this seemed to 
> be the easiest-on-user solution to save him/her an extra effort or a new 
> command.
OK!


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:296
+        'command': [execution.cmd[0], '-c'] + compilation.flags,
+        'ctu': json.loads(os.getenv('ANALYZE_BUILD_CTU'))
     }
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Again, is it possible to avoid JSON-over-environment-variables?
> There is an other thing against changing this. Currently the interface here 
> using env variables is used by intercept-build, analyze-build and scan-build 
> tool as well. In order to drop json, we need to change those tools too. It 
> would be a separate patch definitely.
OK I didn't know that the JSON interface was used by other tools. In that case, 
ignore my comment.


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:567
+        ast_command.append('-o')
+        ast_command.append(ast_path)
+        logging.debug("Generating AST using '%s'", ast_command)
----------------
gerazo wrote:
> george.karpenkov wrote:
> > The above can be written more succinctly as:
> > `ast_command = [opts['clang'], ...] + args + ['-w', ...]`
> After several iterations of the code, I find it easier to version control 
> such multiline constructs. If someone changes a data source, it is clear 
> which one (which line) was modified. The succint notation does not allow 
> clean VCS annotations.
OK. Though you could still use split addition across multiple lines with `\`


================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:609
+    # Recover namedtuple from json when coming from analyze_cc
+    if not hasattr(ctu_config, 'collect'):
+        ctu_config = CtuConfig(collect=ctu_config[0],
----------------
gerazo wrote:
> george.karpenkov wrote:
> > In which case is this branch hit? Isn't improperly formed input argument 
> > indicative of an internal error at this stage?
> An other part of scan-build-py, analyze_cc uses namedtuple to json format to 
> communicate. However, the names are not coming back from json, so this code 
> helps in this. This is the case when someone uses the whole toolset with 
> compiler wrapping. All the environment variable hassle is also happening 
> because of this. So these env vars are not for user modification (as you've 
> suggested earlier).
OK so `opts['ctu']` is a tuple or a named tuple depending on how this function 
is entered? BTW could you point me to the `analyze_cc` entry point?

For the purpose of having more uniform code with less cases to care about, do 
you think we could just use ordinary tuples instead of constructing a named 
one, since we have to deconstruct an ordinary tuple in any case?


================
Comment at: tools/scan-build-py/libscanbuild/arguments.py:376
+            metavar='<ctu-dir>',
+            default='ctu-dir',
+            help="""Defines the temporary directory used between ctu
----------------
BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was 
initially very confused as to where the variable is set.


================
Comment at: tools/scan-build-py/libscanbuild/clang.py:168
+        return False
+    return True
+
----------------
gerazo wrote:
> george.karpenkov wrote:
> > I might be missing something here, but why is the ability to call 
> > `--version` indicative of CTU support?
> > At worst, this can lead to obscuring real bugs: imagine if the user has 
> > `args.clang` pointed to broken/non-existent binary, then `is_ctu_capable` 
> > would simply return `False` (hiding the original error!), which would show 
> > a completely misleading error message.
> > 
> > Just checking `func_map_cmd` seems better, but even in this case we should 
> > probably log any errors occurring on `-version` call (such messages would 
> > really aid debugging)
> The original idea was that clang can give information about CTU support 
> itself. However, it never happened because the analyzer is so deep down in 
> the system. So I am open to remove the clang binary check here. However, 
> clang binary is needed anyway, so the whole toolset will still throw an error 
> later on not having a clang binary.
> so the whole toolset will still throw an error later on not having a clang 
> binary.

Of course, but I think that would be easier to debug, and the error would mean 
that Clang is not available, not that CTU is not working.


https://reviews.llvm.org/D30691



_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

Reply via email to