ChuanqiXu accepted this revision. ChuanqiXu added inline comments.
================ Comment at: clang/lib/Sema/SemaModule.cpp:177 + if (IsPartition) { + ModuleName += ":"; + ModuleName += stringFromPath(Partition); ---------------- urnathan wrote: > iains wrote: > > ChuanqiXu wrote: > > > iains wrote: > > > > urnathan wrote: > > > > > iains wrote: > > > > > > ChuanqiXu wrote: > > > > > > > I chose '-' in my implementation since here ModuleName refers to > > > > > > > the name in filesystem, right? And I remember '-' is the choice > > > > > > > of GCC. So let's keep consistency. > > > > > > This is not my understanding. > > > > > > > > > > > > ModuleName is the "user-facing" name of the module that is > > > > > > described in the source, and we use this in diagnostics etc. > > > > > > > > > > > > The translation of that name to an on-disk name can be arbitrary > > > > > > and there are several mechanisms in clang already (e.g. > > > > > > -fmodule-file=A:B=foo.pcm) the module loader uses these to find the > > > > > > module on the basis of its user-facing name where required. > > > > > > > > > > > > When we have P1184, it is even more important that the interface is > > > > > > supplied with the name that the user will put into the source code. > > > > > > > > > > > > > > > > > I agree with Iain, we should use ':' is module names here. When > > > > > mapping a named module to the file system some translation is needed > > > > > because ':' is not permitted in file names on some filesystems (you > > > > > know the ones!) > > > > (just to expand a little more) > > > > > > > > the on-disk name needs to be chosen suitably for the platform and by > > > > the user and/or the build system. > > > > > > > > When the FE chooses a default filename (which is done in building jobs, > > > > not in the Parser of course) it chooses one based on the source > > > > filename. It would be most unfortunate if the Parser/Sema needed to > > > > understand platform filename rules. > > > > > > > > When you do 'clang -module-file-info' (with the existing or updated > > > > version) you should see the module name as per the source code, the > > > > translation would only apply to the filename itself. > > > > > > > > - to answer a later comment: > > > > > > > > when we do -fmodule-file=A_B.pcm and A_B.pcm contains a module named > > > > A:B the loader notices this pairing when it pre-loads the module, so > > > > that when we ask for "A:B" the loader already knows which on-disk file > > > > contains. it. > > > > > > > > if we use the HeaderSearch mechanisms (when we want to figure out a > > > > module-name<=> filename pairing without loading the module) we can use > > > > -fmodule-name=A:B=A_B.pcm, > > > > > > > > These mechanisms work today, but P1184 is a more flexible mechanism and > > > > avoids having massive command lines with many -fmodule-file= cases. > > > > > > > But what if we need to import multiple modules? In our current workflow, > > > we would like to compile importing module in following style: > > > ``` > > > clang++ -std=c++20 -fprebuilt-module-path=path1 > > > -fprebuilt-module-path=path2 ... unit.cpp(m) ... > > > ``` > > > > > > And unit.cpp(m) may look like: > > > ``` > > > export module M; > > > import :parta; > > > import :partb; > > > import :partc; > > > ``` > > > > > > And our compiler would lookup `M-parta.pcm`, `M-partb.pcm` and > > > `M-partc.pcm` in the path given in the command line. However, in current > > > implementation, it would lookup `M:parta.pcm`, `M:partb.pcm` and > > > `M:partc.pcm` in the path but it might fail. So my point here is that the > > > current implementation doesn't work well with `fprebuilt-module-path`. > > > And I don't think we should give up `fprebuilt-module-path` to support > > > multiple `fmodule-file`. The `fprebuilt-module-path` is easier to > > > understand and use for real projects. Even if we support multiple > > > `-fmodule-file`, people might be frustrating to add many `fmodule-file` > > > if they need to import many units. > > > > > > So I really insist on this. Or at least let's add one option, something > > > like `-fmodule-partition-separator` here. > > The semantics of clang hierarchical module names are different from C++20 > > module names. > > > > C++20 does not specify any relationship between the module name and the > > filename - that task is for the user, build system (or in the case we have > > with clang where it can integrate some of that functionality, the lookup > > mechanism). > > > > out of interest if the user puts: > > > > ``` > > export module A.B:C.D; > > ``` > > > > how do you represent that on disk in your implementation? > > > > ---- > > > > In my understanding, if the user does: > > > > ``` > > export module A.B:C.D; > > > > clang -cc1 -std=c++20 -emit-module-interface foo.cpp -o xyzzy.pcm > > ``` > > then we should see : > > > > ``` > > clang -module-file-info xyzzy.pcm > > > > Information for module file 'xyzzy.pcm': > > Module format: raw > > Generated by this Clang: (/repos/llvm-git.git > > 97fc6544d6a09f161d90417db05674a2b3d2a5fe) > > Module name: A.B:C.D > > > > ``` > > > > I think it would be wrong if that printed: > > ` Module name: A.B-C.D` > > > > The primary module name is A.B (_not_ A) and there is no implication that > > we would see > > A/B/... > > > > What would happen if we ported clang to some (unknown, unlikely) platform > > where '-' was not a legal pathname character - we should not expect to have > > to change Sema for this, right? > > > > ----- > > > > In summary my understanding is: > > > > * The compiler (Parser/Sema) should know the C++20 module name as it is > > written by the user. > > > > * You **can** achieve your objective of using the prebuilt-module-path, but > > the translation from C++20 module name to on-disk name needs to happen in > > the lookup, not in artificially changing the module name internally to the > > compiler. > > > > ----- > > > > As a general point, there are already (too) many modules command line > > flags, and we will probably need to add a few more for C++20. I have tried > > in my implementation to keep separate those for C++20 and those for > > implicit / hierarchical and modules ts. > > > > IMHO it would be better for the user if modules were modal in the driver - > > and the driver rejected options that do not apply to the mode in action - > > but that is not for this patch! > > > > > Information should be presented to the user in the form the user understands > -- M-P is not such a format. As Iain says, the association from module names > to file names is arbitrary, and baking (a default?) into Sema is not the > right thing. > > In some cases the compiler may need to tell the user both. For instance > cannot find module 'M:P' (random/path/M-P.pcm) > > Let's not derail the initial pieces of partitions with this complex mapping > problem. I still think the compiler should have a `-fmodule-partition-separator` option to tell the partition seperator in filesystem to ease the use of C++20 modules. It looks like that the current implementation couldn't handle the case that a module unit imports multiple partitions. But I agree with there already too many options about clang modules. And I agree with we could handle the problem in later patches. I have no objection to stop the patch landing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D118586/new/ https://reviews.llvm.org/D118586 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits