george.karpenkov added inline comments.
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:44
+CTU_FUNCTION_MAP_FILENAME = 'externalFnMap.txt'
+CTU_TEMP_FNMAP_FOLDER = 'tmpExternalFnMaps'
----------------
gerazo wrote:
> george.karpenkov wrote:
> > What would happen when multiple analyzer runs are launched concurrently in
> > the same directory? Would they race on this file?
> Yes and no. The 1st, collect part of CTU creates this file by aggregating all
> data from the build system, while the 2nd part which does the analysis itself
> only reads it. Multiple analysis can use this file simultaneously without
> problem. However, multiple collect phases launched on a system does not make
> sense. In this case, the later one would write over the previous results with
> the same data.
> However, multiple collect phases launched on a system does not make sense
Why not? What about a system with multiple users, where code is in the shared
directory? Or even simpler: I've launched a background process, forgot about
it, and then launched it again?
> In this case, the later one would write over the previous results with the
> same data.
That is probably fine, I am more worried about racing, where process B would be
reading a partially overriden file (not completely sure whether it is possible)
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:103
+def prefix_with(constant, pieces):
+ """ From a sequence create another sequence where every second element
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Actually I would prefer a separate NFC PR just moving this function. This
> > PR is already too complicated as it is.
> Yes. The only reason was for the move to make it testable. However, we need
> more tests as you wrote down below.
Sure, but that separate PR can also include tests.
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:120
+ func_map_cmd=args.func_map_cmd)
+ if hasattr(args, 'ctu_phases') and hasattr(args.ctu_phases, 'dir')
+ else CtuConfig(collect=False, analyze=False, dir='', func_map_cmd=''))
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Extensive `hasattr` usage is often a codesmell, and it hinders
> > understanding.
> > Firstly, shouldn't `args.ctu_phases.dir` be always available, judging from
> > the code in `arguments.py`?
> > Secondly, in what cases `args.ctu_phases` is not available when we are
> > already going down the CTU code path? Shouldn't we throw an error at that
> > point instead of creating a default configuration?
> > (default-configurations-instead-of-crashing is also another way to
> > introduce very subtle and hard-to-catch error modes)
> It definitely needs more comments, so thanks, I will put them here.
> There are two separate possibilities here for this code part:
> 1. The user does not want CTU at all. In this case, no CTU phases are
> switched on (collect and analyze), so no CTU code will run. This is why dir
> has no importance in this case.
> 2. CTU capabilities were not even detected, so the help and available
> arguments about CTU are also missing. In this case we create a dummy config
> telling that everything is switched off.
>
> The reason for using hasattr was that not having CTU capabilities is not an
> exceptional condition, rather a matter of configuration. I felt that handling
> this with an exception would be misleading.
Right, but instead of doing (1) and (2) can we have a separate (maybe hidden
from user) param called e.g. `ctu_enabled` which would explicitly communicate
that?
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:144
+ if len(ast_files) == 1:
+ mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Firstly, no need to modify the set in order to get the first element, just
> > use `next(iter(ast_files))`.
> > Secondly, when exactly do conflicting names happen? Is it a bug? Shouldn't
> > the tool log such cases?
> Nice catch, thanks.
> For your second question: Unfortunately, it is not a bug, rather a misfeature
> which we can't handle yet. There can be tricky build-systems where a single
> file with different configurations is built multiple times, so the same
> function signature will be present in multiple link units. The other option
> is when two separate compile units have an exact same function signature, but
> they are never linked together, so it is not a problem in the build itself.
> In this case, the cross compilation unit functionality cannot tell exactly
> which compilation unit to turn to for the function, because there are
> multiple valid choices. In this case, we deliberately leave such functions
> out to avoid potential problems. It deserves a comment I think.
> In this case, we deliberately leave such functions out to avoid potential
> problems
We probably want to log it, don't we?
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:145
+ mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
+ return mangled_ast_pairs
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Overall, instead of creating a dictionary with multiple elements, and then
> > converting to a list, it's much simper to only add an element to
> > `mangled_to_asts` when it is not already mapped to something (probably
> > logging a message otherwise), and then just return `mangled_to_asts.items()`
> The reason for the previous is that we need to count the occurence number
> ofdifferent mappings only let those pass through which don't have multiple
> variations.
ah, OK!
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:189
+ # Remove all temporary files
+ shutil.rmtree(fnmap_dir, ignore_errors=True)
+
----------------
gerazo wrote:
> gerazo wrote:
> > george.karpenkov wrote:
> > > Having an analysis tool remove files is scary, what if (maybe not in this
> > > revision, but in a future iteration) a bug is introduced, and the tool
> > > removes user code instead?
> > > Why not just create a temporary directory with `tempfile.mkdtemp`, put
> > > all temporary files there, and then simply iterate through them?
> > > Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER`
> > > entirely, and OS would be responsible for cleanup.
> > Yes, you are right. We are essentially using a temp dir. Because of the
> > size we first had to put it next to the project (not on tmp drive for
> > instance) and for debugging purposes we gave a name to it. Still it can be
> > done with mkdtemp as well.
> Finally, I came to the conclusion that mkdtemp would not be better than the
> current solution. In order to find our created dir by other threads, we need
> a designated name. Suffixing it by generated name would further complicate
> things as we need not to allow multiple concurrent runs here. The current
> solution is more robust from this point of view.
OK
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:230
+ if ctu_config.collect:
+ shutil.rmtree(ctu_config.dir, ignore_errors=True)
+ if ctu_config.collect and ctu_config.analyze:
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Similarly to the comment above, I would prefer if analysis tool would not
> > remove files (and I assume those are not huge ones?)
> > Can we just use temporary directories?
> Unlike above, here we do remove non-temporary data intentionally. The user
> asks here to do the recollection of CTU data for a fresh start. Because there
> is no "clean" functionality in the analyzer interface itself, this seemed to
> be the easiest-on-user solution to save him/her an extra effort or a new
> command.
OK!
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:296
+ 'command': [execution.cmd[0], '-c'] + compilation.flags,
+ 'ctu': json.loads(os.getenv('ANALYZE_BUILD_CTU'))
}
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Again, is it possible to avoid JSON-over-environment-variables?
> There is an other thing against changing this. Currently the interface here
> using env variables is used by intercept-build, analyze-build and scan-build
> tool as well. In order to drop json, we need to change those tools too. It
> would be a separate patch definitely.
OK I didn't know that the JSON interface was used by other tools. In that case,
ignore my comment.
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:567
+ ast_command.append('-o')
+ ast_command.append(ast_path)
+ logging.debug("Generating AST using '%s'", ast_command)
----------------
gerazo wrote:
> george.karpenkov wrote:
> > The above can be written more succinctly as:
> > `ast_command = [opts['clang'], ...] + args + ['-w', ...]`
> After several iterations of the code, I find it easier to version control
> such multiline constructs. If someone changes a data source, it is clear
> which one (which line) was modified. The succint notation does not allow
> clean VCS annotations.
OK. Though you could still use split addition across multiple lines with `\`
================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:609
+ # Recover namedtuple from json when coming from analyze_cc
+ if not hasattr(ctu_config, 'collect'):
+ ctu_config = CtuConfig(collect=ctu_config[0],
----------------
gerazo wrote:
> george.karpenkov wrote:
> > In which case is this branch hit? Isn't improperly formed input argument
> > indicative of an internal error at this stage?
> An other part of scan-build-py, analyze_cc uses namedtuple to json format to
> communicate. However, the names are not coming back from json, so this code
> helps in this. This is the case when someone uses the whole toolset with
> compiler wrapping. All the environment variable hassle is also happening
> because of this. So these env vars are not for user modification (as you've
> suggested earlier).
OK so `opts['ctu']` is a tuple or a named tuple depending on how this function
is entered? BTW could you point me to the `analyze_cc` entry point?
For the purpose of having more uniform code with less cases to care about, do
you think we could just use ordinary tuples instead of constructing a named
one, since we have to deconstruct an ordinary tuple in any case?
================
Comment at: tools/scan-build-py/libscanbuild/arguments.py:376
+ metavar='<ctu-dir>',
+ default='ctu-dir',
+ help="""Defines the temporary directory used between ctu
----------------
BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was
initially very confused as to where the variable is set.
================
Comment at: tools/scan-build-py/libscanbuild/clang.py:168
+ return False
+ return True
+
----------------
gerazo wrote:
> george.karpenkov wrote:
> > I might be missing something here, but why is the ability to call
> > `--version` indicative of CTU support?
> > At worst, this can lead to obscuring real bugs: imagine if the user has
> > `args.clang` pointed to broken/non-existent binary, then `is_ctu_capable`
> > would simply return `False` (hiding the original error!), which would show
> > a completely misleading error message.
> >
> > Just checking `func_map_cmd` seems better, but even in this case we should
> > probably log any errors occurring on `-version` call (such messages would
> > really aid debugging)
> The original idea was that clang can give information about CTU support
> itself. However, it never happened because the analyzer is so deep down in
> the system. So I am open to remove the clang binary check here. However,
> clang binary is needed anyway, so the whole toolset will still throw an error
> later on not having a clang binary.
> so the whole toolset will still throw an error later on not having a clang
> binary.
Of course, but I think that would be easier to debug, and the error would mean
that Clang is not available, not that CTU is not working.
https://reviews.llvm.org/D30691
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits