ChuanqiXu accepted this revision.
ChuanqiXu added inline comments.

================
Comment at: clang/lib/Sema/SemaModule.cpp:177
+  if (IsPartition) {
+    ModuleName += ":";
+    ModuleName += stringFromPath(Partition);
----------------
urnathan wrote:
> iains wrote:
> > ChuanqiXu wrote:
> > > iains wrote:
> > > > urnathan wrote:
> > > > > iains wrote:
> > > > > > ChuanqiXu wrote:
> > > > > > > I chose '-' in my implementation since here ModuleName refers to 
> > > > > > > the name in filesystem, right? And I remember '-' is the choice 
> > > > > > > of GCC. So let's keep consistency.
> > > > > > This is not my understanding.
> > > > > > 
> > > > > > ModuleName is the "user-facing" name of the module that is 
> > > > > > described in the source, and we use this in diagnostics etc.
> > > > > > 
> > > > > > The translation of that name to an on-disk name can be arbitrary 
> > > > > > and there are several mechanisms in clang already (e.g. 
> > > > > > -fmodule-file=A:B=foo.pcm) the module loader uses these to find the 
> > > > > > module on the basis of its user-facing name where required.
> > > > > > 
> > > > > > When we have P1184, it is even more important that the interface is 
> > > > > > supplied with the name that the user will put into the source code.
> > > > > > 
> > > > > > 
> > > > > I agree with Iain, we should use ':' is module names here.  When 
> > > > > mapping a named module to the file system some translation is needed 
> > > > > because ':' is not permitted in file names on some filesystems (you 
> > > > > know the ones!)
> > > > (just to expand a little more)
> > > > 
> > > > the on-disk name needs to be chosen suitably for the platform and by 
> > > > the user and/or the build system.
> > > > 
> > > > When the FE chooses a default filename (which is done in building jobs, 
> > > > not in the Parser of course) it chooses one based on the source 
> > > > filename.  It would be most unfortunate if the Parser/Sema needed to 
> > > > understand platform filename rules.
> > > > 
> > > > When you do  'clang -module-file-info' (with the existing or updated 
> > > > version) you should see the module name as per the source code, the 
> > > > translation would only apply to the filename itself.
> > > > 
> > > > - to answer a later comment:
> > > > 
> > > > when we do -fmodule-file=A_B.pcm  and A_B.pcm contains a module named 
> > > > A:B the loader notices this pairing when it pre-loads the module, so 
> > > > that when we ask for "A:B" the loader already knows which on-disk file 
> > > > contains. it.
> > > > 
> > > > if we use the HeaderSearch mechanisms (when we want to figure out a 
> > > > module-name<=> filename pairing without loading the module) we can use 
> > > > -fmodule-name=A:B=A_B.pcm,
> > > > 
> > > > These mechanisms work today, but P1184 is a more flexible mechanism and 
> > > > avoids having  massive command lines with many -fmodule-file= cases.
> > > > 
> > > But what if we need to import multiple modules? In our current workflow, 
> > > we would like to compile importing module in following style:
> > > ```
> > > clang++ -std=c++20 -fprebuilt-module-path=path1 
> > > -fprebuilt-module-path=path2 ... unit.cpp(m) ...
> > > ```
> > > 
> > > And unit.cpp(m) may look like:
> > > ```
> > > export module M;
> > > import :parta;
> > > import :partb;
> > > import :partc;
> > > ```
> > > 
> > > And our compiler would lookup `M-parta.pcm`, `M-partb.pcm` and 
> > > `M-partc.pcm` in the path given in the command line. However, in current 
> > > implementation, it would lookup `M:parta.pcm`, `M:partb.pcm` and 
> > > `M:partc.pcm` in the path but it might fail. So my point here is that the 
> > > current implementation doesn't work well with `fprebuilt-module-path`. 
> > > And I don't think we should give up `fprebuilt-module-path` to support 
> > > multiple `fmodule-file`. The `fprebuilt-module-path` is easier to 
> > > understand and use for real projects. Even if we support multiple 
> > > `-fmodule-file`, people might be frustrating to add many `fmodule-file` 
> > > if they need to import many units.
> > > 
> > > So I really insist on this. Or at least let's add one option, something 
> > > like `-fmodule-partition-separator` here.
> > The semantics of clang hierarchical module names are different from C++20 
> > module names.
> > 
> > C++20 does not specify any relationship between the module name and the 
> > filename - that task is for the user, build system (or in the case we have 
> > with clang where it can integrate some of that functionality, the lookup 
> > mechanism).
> > 
> > out of interest if the user puts:
> > 
> > ```
> > export module A.B:C.D;
> > ```
> > 
> > how do you represent that on disk in your implementation?
> > 
> > ----
> > 
> > In my understanding, if the user does:
> > 
> > ```
> > export module A.B:C.D;
> > 
> > clang -cc1 -std=c++20 -emit-module-interface foo.cpp -o xyzzy.pcm
> > ```
> > then we should see :
> > 
> > ```
> > clang -module-file-info xyzzy.pcm
> > 
> > Information for module file 'xyzzy.pcm':
> >   Module format: raw
> >   Generated by this Clang: (/repos/llvm-git.git 
> > 97fc6544d6a09f161d90417db05674a2b3d2a5fe)
> >   Module name: A.B:C.D
> > 
> > ```
> > 
> > I think it would be wrong if that printed:
> > ` Module name: A.B-C.D`
> > 
> > The primary module name is A.B (_not_ A) and there is no implication that 
> > we would see
> > A/B/...
> > 
> > What would happen if we ported clang to some (unknown, unlikely) platform 
> > where '-' was not a legal pathname character - we should not expect to have 
> > to change Sema for this, right?
> > 
> > -----
> > 
> > In summary my understanding is:
> > 
> > * The compiler (Parser/Sema) should know the C++20 module name as it is 
> > written by the user.
> > 
> > * You **can** achieve your objective of using the prebuilt-module-path, but 
> > the translation from C++20 module name to on-disk name needs to happen in 
> > the lookup, not in artificially changing the module name internally to the 
> > compiler.
> > 
> > -----
> > 
> > As a general point, there are already (too) many modules command line 
> > flags, and we will probably need to add a few more for C++20.  I have tried 
> > in my implementation to keep separate those for C++20 and those for 
> > implicit / hierarchical and modules ts.
> > 
> > IMHO it would be better for the user if modules were modal in the driver - 
> > and the driver rejected options that do not apply to the mode in action - 
> > but that is not for this patch!
> > 
> > 
> Information should be presented to the user in the form the user understands 
> -- M-P is not such a format.  As Iain says, the association from module names 
> to file names is arbitrary, and baking (a default?) into Sema is not the 
> right thing.
> 
> In some cases the compiler may need to tell the user both.  For instance
>   cannot find module 'M:P' (random/path/M-P.pcm)
> 
> Let's not derail the initial pieces of partitions with this complex mapping 
> problem.
I still think the compiler should have a `-fmodule-partition-separator` option 
to tell the partition seperator in filesystem to ease the use of C++20 modules. 
It looks like that the current implementation couldn't handle the case that a 
module unit imports multiple  partitions. 

But I agree with there already too many options about clang modules. And I 
agree with we could handle the problem in later patches. I have no objection to 
stop the patch landing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118586/new/

https://reviews.llvm.org/D118586

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to