[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-11-08 Thread Ben Boeckel via Phabricator via cfe-commits
ben.boeckel added a comment.

FWIW, I was able to make CMake's module test suite pass with this patch on top 
of you MyP1689 branch on your Github fork. I also added some diffs to help 
clean up output files. I'll try and get it to work with the replacement patches 
as well, but I just want this to be a working checkpoint.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-11-03 Thread Ben Boeckel via Phabricator via cfe-commits
ben.boeckel added a comment.

In D134267#3904257 , @ChuanqiXu wrote:

> BTW, this patch should be discarded. And now we're pursuing D137059 
>  and D137058 
> , where there is no modules cache. I know 
> it is a little bit frustrating to change the demo all the time... but let's 
> turn for that direction now.

Sorry, Phab is obtuse/unfamiliar for me right now; the "Abandoned" state isn't 
prevalent enough and all the pages look the sameā€¦


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-11-02 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3902564 , @ben.boeckel 
wrote:

> I tried applying this patch to your MyP1689 branch (fixing conflicts with 
> attempts there), but it seems that something isn't being plumbed properly:
>
>   clang-15: error: unknown argument: 
> '-fc++-module-file-output=CMakeFiles/export_bmi_and_interfaces.dir/importable.pcm'
>   clang-15: error: unable to create default module cache path 
> "/home/boeckb/.cache/clang/ModuleCache": No such file or direc
>   tory
>
> There's no way to disable the "module cache path" either (it fails here 
> because I symlinked it to `/dev/null` to avoid things "working" behind my 
> back).

Oh, it is odd. The option `-fc++-module-file-output` should have higher 
priority than creating module cache path.

BTW, this patch should be discarded. And now we're pursuing D137059 
 and D137058 
, where there is no modules cache. I know it 
is a little bit frustrating to change the demo all the time... but let's turn 
for that direction now.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-11-02 Thread Ben Boeckel via Phabricator via cfe-commits
ben.boeckel added a comment.

I tried applying this patch to your MyP1689 branch (fixing conflicts with 
attempts there), but it seems that something isn't being plumbed properly:

  clang-15: error: unknown argument: 
'-fc++-module-file-output=CMakeFiles/export_bmi_and_interfaces.dir/importable.pcm'
  clang-15: error: unable to create default module cache path 
"/home/boeckb/.cache/clang/ModuleCache": No such file or direc
  tory

There's no way to disable the "module cache path" either (it fails here because 
I symlinked it to `/dev/null` to avoid things "working" behind my back).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-31 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3892629 , @ben.boeckel 
wrote:

> In D134267#3876071 , @dblaikie 
> wrote:
>
>> I'm getting a bit exhausted with all the words involved here & not sure how 
>> to simplify/clarify this.
>>
>> If @ben.boeckel has particular use cases, it might be easier for him to be 
>> here discussing them so we can discuss the tradeoffs directly rather than 
>> through intermediaries.
>
> Sorry, I've been lax on actually keeping up-to-date on all of these Clang 
> threads.
>
> The current state is that CMake's test suite for C++ modules does not work 
> with clang because it doesn't provide the control CMake needs to do its 
> explicit builds. @ChuanqiXu's MyP1689 branch is close except that the `.pcm` 
> output path can't seem to be controlled closely enough for reliable builds. 
> Namely, I would like:
>
> - the ability to disable the module cache (completely); CMake doesn't need 
> nor want it
> - the ability to say put the generated `.pcm` that this TU will generate goes 
> *here*
>
> The latter is currently a juggle of flags and restrictions that I haven't 
> been able to figure out. Without something like `-fmodule-output-path=`, 
> convincing Clang to output its `.pcm` file to where CMake wants it to be is 
> some combination of `-fmodules-cache-path=` and `-fmodule-name=` while hoping 
> that the internal path computation inside of Clang makes what CMake wants. 
> It's just far easier to have something like `-o` that gives the answer 
> straight away.
>
> For the former, I use `-x c++module` as well so that I don't have to care 
> about any extension-sniffing behaviors. I also need to give a bogus 
> `-fmodule-cache-path=` for module consumers because as soon as `import` is 
> seen, it wants to run off to look at the cache instead of noticing that 
> there's a `-fmodule-file=` for everything it needs already and the cache is 
> 100% unused.

BTW, the bogus `-fmodule-cache-path=` for module consumers can be addressed by 
the existing `-fprebuilt-module-path`. And I sent the new revisions in D137058 
 and D137059 
.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-28 Thread Ben Boeckel via Phabricator via cfe-commits
ben.boeckel added a comment.

In D134267#3876071 , @dblaikie wrote:

> I'm getting a bit exhausted with all the words involved here & not sure how 
> to simplify/clarify this.
>
> If @ben.boeckel has particular use cases, it might be easier for him to be 
> here discussing them so we can discuss the tradeoffs directly rather than 
> through intermediaries.

Sorry, I've been lax on actually keeping up-to-date on all of these Clang 
threads.

The current state is that CMake's test suite for C++ modules does not work with 
clang because it doesn't provide the control CMake needs to do its explicit 
builds. @ChuanqiXu's MyP1689 branch is close except that the `.pcm` output path 
can't seem to be controlled closely enough for reliable builds. Namely, I would 
like:

- the ability to disable the module cache (completely); CMake doesn't need nor 
want it
- the ability to say put the generated `.pcm` that this TU will generate goes 
*here*

The latter is currently a juggle of flags and restrictions that I haven't been 
able to figure out. Without something like `-fmodule-output-path=`, convincing 
Clang to output its `.pcm` file to where CMake wants it to be is some 
combination of `-fmodules-cache-path=` and `-fmodule-name=` while hoping that 
the internal path computation inside of Clang makes what CMake wants. It's just 
far easier to have something like `-o` that gives the answer straight away.

For the former, I use `-x c++module` as well so that I don't have to care about 
any extension-sniffing behaviors. I also need to give a bogus 
`-fmodule-cache-path=` for module consumers because as soon as `import` is 
seen, it wants to run off to look at the cache instead of noticing that there's 
a `-fmodule-file=` for everything it needs already and the cache is 100% unused.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-23 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu abandoned this revision.
ChuanqiXu added a comment.

In D134267#3876071 , @dblaikie wrote:

> I'm getting a bit exhausted with all the words involved here & not sure how 
> to simplify/clarify this.
>
> If @ben.boeckel has particular use cases, it might be easier for him to be 
> here discussing them so we can discuss the tradeoffs directly rather than 
> through intermediaries.

Agreed.

> I think the choices of flags, even when they represent relatively minor work 
> on the compiler side, are important in terms of how they shape the 
> environment the compiler exists in. I have reservations about implementing 
> the libCody and the scanner-based solutions (let alone also caching based 
> solutions) - but that ship's probably already sailed in terms of it's 
> implemented in GCC and build2 is using it. (sort of like open source software 
> - we implement things for compatibility (like LGPL) but when we're the ones 
> innovating/creating new things we can and should be more cautious/possibly 
> more prescriptive to avoid creating more diversity/divergence than is 
> necessary)
>
> Please separate this work into isolated patches & we can discuss them 
> separately. I think this review might be best to abandon as the subject 
> line/description's out of synch and there's been a /lot/ of discussion going 
> in a lot of directions such that it'd be hard to understand the 
> conclusions/focus of this review at this point.

Yeah, I agree this thread is complex enough. I'll try to split the patches.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-21 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

I'm getting a bit exhausted with all the words involved here & not sure how to 
simplify/clarify this.

If @ben.boeckel has particular use cases, it might be easier for him to be here 
discussing them so we can discuss the tradeoffs directly rather than through 
intermediaries.

I think the choices of flags, even when they represent relatively minor work on 
the compiler side, are important in terms of how they shape the environment the 
compiler exists in. I have reservations about implementing the libCody and the 
scanner-based solutions (let alone also caching based solutions) - but that 
ship's probably already sailed in terms of it's implemented in GCC and build2 
is using it. (sort of like open source software - we implement things for 
compatibility (like LGPL) but when we're the ones innovating/creating new 
things we can and should be more cautious/possibly more prescriptive to avoid 
creating more diversity/divergence than is necessary)

Please separate this work into isolated patches & we can discuss them 
separately. I think this review might be best to abandon as the subject 
line/description's out of synch and there's been a /lot/ of discussion going in 
a lot of directions such that it'd be hard to understand the conclusions/focus 
of this review at this point.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-21 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@dblaikie I am confusing about your concern. For the test coverage example, on 
the one hand, it is completely different from the case the coverage report was 
generated in the runtime instead of the compile time. On the other hand, it 
also provides a `-fprofile-dir` option to control the output directory.

> I'd be happy not to add this until someone comes back with a demonstrated need

Yeah, we have demos. Both 
https://github.com/ChuanqiXu9/llvm-project/tree/MyP1689/P1689Examples and 
https://reviews.llvm.org/D135507 are demos for this.

> it doesn't seem like it'd be a major imposition on a build system to copy the 
> file if it needed to to move it into a release tree V the build tree, for 
> instance. And if it did add up to real performance problems, we could add it.

It doesn't make sense. Yes, the build system can provide the same behavior by 
`mv` commands. But why should we put the effort to the build systems? 
Especially it won't bring any cost for us to generate the pcm files in the 
specified positions except an additional flags. And it will be an overhead to 
move the pcm files at least now. For example, for the following simple module 
unit:

  module;
  #include 
  export module Hello;
  export void HelloWorld() {
  std::cout << "Hello World.\n";
  }

the size of the generated `pcm` file is `4.3M`. I know we can say that it is 
bad the compiler generate a so big pcm files. And it will be great if we can 
reduce them. However, this the status quo.

> then another for "here's a flag to name the .pcm output file" (not sure if we 
> need that, really, just yet)

It is needed for the build systems to specify the position of pcm output files. 
In fact, this option is required by @ben.boeckel

> possibly another for "here's an option to write the .pcm file out to a cache 
> directory instead of the .o path or custom .pcm output path" (I've even 
> stronger reservations about that).

To be honest, at least for the current experiments with kitware guys, this is 
not necessary. However, I think this is helpful for people to (try to) use 
modules in their current build systems. We know this is not so good. But it is 
much better than now.

---

After thinking about this more, I found the root divergence between us is about 
the principle/bar to add a compilation flag sugar (not a functionality but a 
flag). I feel like your standard for this patch is my standard for the new 
functionality instead of the compilation flag sugar.

What is a compilation flag sugar to me? My answer is: if we can get the same 
result without this flag, then this flag is a a compilation flag sugar to me. 
Then whole of this patch is a compilation sugar to me. How can we get the same 
result without this patch? For example, for the simple Hello.cppm, we can do 
the following steps:

  clang++ -std=c++20 Hello.cppm -save-temps -c

then we can see the following files in the current directory:

  Hello.bc  Hello.cppm  Hello.iim  Hello.o  Hello.pcm  Hello.s

the we can remove all the unwanted things, we can get:

  Hello.cppm Hello.pcm

Now we can move the `Hello.pcm` to the so called cache directory.  Then we get 
the exactly the same result without this patch! This is the reason why I feel 
this patch is innocent. Since it only reduces the above steps into two flags. 
And if we take the necessity as the standard for this patch. Then whole of this 
patch is meaningless. Since the user can still get the results by some shell 
tricks.

But I think it should be good to do such things for our uses. We shouldn't push 
such burdens to our users if we can. And it shouldn't bring a burden to us to 
introduce such two flags.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-20 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

>> (I'd still sort of lean towards "make it the same as the .o, but with the 
>> .pcm suffix" and if the build system wants to put things in other places, it 
>> can move them around - but I understand why that's not something everyone's 
>> on board with)
>
> Actually, I would think that to be a great default (and put it in the CWD 
> just like the object),
> .. but it seems (in addition to the default) we do need a way to place 
> specific module files - and I could imagine needing to make different places 
> for different command line option sets.  Asking the driver or build system to 
> move the file seems like spawning more processes and we'd still need a 
> user-visible way to say where we wanted the file to be put.

At least I'd like these to be separate patches - adding ".cppm -> {.pcm + .o}" 
action and then separately adding a way to specify the name of the .pcm output 
file.

But I still have some reservations about even providing a flag for it - I think 
the example I was thinking of was coverage files ( 
https://gcc.gnu.org/onlinedocs/gcc/Gcov-Data-Files.html ) which also don't have 
a custom path option, like Split DWARF. I'd be happy not to add this until 
someone comes back with a demonstrated need - it doesn't seem like it'd be a 
major imposition on a build system to copy the file if it needed to to move it 
into a release tree V the build tree, for instance. And if it did add up to 
real performance problems, we could add it.

>> Yeah, if the `-fmodule-file` syntax is currently only for input I don't 
>> think we need to try to overload it for output as well.
>
> My understanding [which could be flawed] is that the second syntax (which 
> already exists, it's not an invention of this discussion) is intended to 
> allow specification of output files.

I haven't seen evidence of that, but it's certainly not something I've got lots 
of info on. It looks like it was added in https://reviews.llvm.org/D35020 as a 
search path.

I think the first patch (be it using this review or another) should be just the 
".cppm -> {.pcm + .o}" (which should basically be entirely in the driver, 
right? Since for now it'll still be two actions that are already implemented) 
and then another for "here's a flag to name the .pcm output file" (not sure if 
we need that, really, just yet) and possibly another for "here's an option to 
write the .pcm file out to a cache directory instead of the .o path or custom 
.pcm output path" (I've even stronger reservations about that). I guess 
discussing the second and third somewhat together, if their flag interface 
might interact (since they both specify where the .pcm file goes).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-20 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu updated this revision to Diff 469136.
ChuanqiXu added a comment.

Use `-fc++-module-file-output` to address the comments.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticCommonKinds.td
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/create_module_cache.cpp
  clang/test/Modules/one-phase-compilation-named-modules.cppm

Index: clang/test/Modules/one-phase-compilation-named-modules.cppm
===
--- /dev/null
+++ clang/test/Modules/one-phase-compilation-named-modules.cppm
@@ -0,0 +1,26 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: %clang -std=c++20 %t/M.cppm -c -o - -fc++-modules-cache-path=%t/pcm.cache
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+//
+// RUN: rm -f %t/M.pcm
+//
+// RUN: %clang -std=c++20 %t/MismatchedName.cppm -c -o - -fc++-module-file-output=%t/pcm.cache/M.pcm
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+
+//--- M.cppm
+export module M;
+export int getValue();
+
+//--- MismatchedName.cppm
+export module M;
+export int getValue();
+
+//--- Use.cpp
+// expected-no-diagnostics
+import M;
+int Use() {
+return getValue();
+}
Index: clang/test/Driver/create_module_cache.cpp
===
--- /dev/null
+++ clang/test/Driver/create_module_cache.cpp
@@ -0,0 +1,31 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: not %clang -std=c++20 %t/M.cppm -fc++-modules-cache-path= -c -o %t/M.tmp.o 2>&1 | FileCheck %t/M.cppm --check-prefix=ERROR
+// RUN: %clang -std=c++20 %t/M.cppm -fc++-modules-cache-path=%t/abc -c -o -
+// RUN: ls %t | FileCheck %t/M.cppm --check-prefix=CHECK-AVAILABLE
+// RUN: %clang -std=c++20 %t/M.cppm -fc++-modules-cache-path=%t/abc -fno-implicit-modules -c -o -
+// RUN: ls %t | FileCheck %t/M.cppm --check-prefix=CHECK-AVAILABLE
+//
+// RUN: %clang -std=c++20 %t/Use.cpp -fc++-modules-cache-path=abc -### 2>&1 | FileCheck %t/Use.cpp
+// RUN: %clang -std=c++20 %t/Use.cpp -fc++-modules-cache-path=abc -fno-implicit-modules -### 2>&1 | FileCheck %t/Use.cpp
+//
+// Check that the compiler will generate M-Part.pcm correctly.
+// RUN: %clang -std=c++20 %t/Part.cppm -fc++-module-file-output=%t/M-Part.pcm -c -o -
+// RUN: ls %t | FileCheck %t/Part.cppm --check-prefix=CHECK-AVAILABLE
+
+//--- M.cppm
+export module M;
+
+// ERROR: unable to create default module cache path "": No such file or directory
+// CHECK-AVAILABLE: abc
+
+//--- Use.cpp
+import M;
+
+// CHECK: -fprebuilt-module-path=abc
+
+//--- Part.cppm
+export module M:Part;
+// CHECK-AVAILABLE: M-Part.pcm
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -3750,6 +3750,7 @@
   std::string("-fprebuilt-module-path=") + A->getValue()));
   A->claim();
 }
+
 if (Args.hasFlag(options::OPT_fprebuilt_implicit_modules,
  options::OPT_fno_prebuilt_implicit_modules, false))
   CmdArgs.push_back("-fprebuilt-implicit-modules");
@@ -3759,6 +3760,18 @@
   CmdArgs.push_back("-fvalidate-ast-input-files-content");
   }
 
+  // If we're in standard c++ modules, lookup in cache path automatically.
+  if (HaveStdCXXModules) {
+SmallString<128> Path;
+if (Arg *A = Args.getLastArg(options::OPT_fcxx_modules_cache_path))
+  Path = A->getValue();
+else
+  Driver::getDefaultModuleCachePath(Path);
+if (!Path.empty())
+  CmdArgs.push_back(Args.MakeArgString(
+  std::string("-fprebuilt-module-path=") + Path));
+  }
+
   // -fmodule-name specifies the module that is currently being built (or
   // used for header checking by -fmodule-maps).
   Args.AddLastArg(CmdArgs, options::OPT_fmodule_name_EQ);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -5558,6 +5558,30 @@
 return "-";
   }
 
+  if (isa(JA) && JA.getType() == types::TY_ModuleFile) {
+if (Arg *A = C.getArgs().getLastArg(options::OPT_fcxx_module_file_output))
+  return C.addResultFile(A->getValue(),  );
+
+SmallString<128> Path;
+if (Arg *A = C.getArgs().getLastArg(options::OPT_fcxx_modules_cache_path))
+  Path = A->getValue();
+else
+  Driver::getDefaultModuleCachePath(Path);
+
+std::error_code EC =
+llvm::sys::fs::create_directories(Path, /*IgnoreExisting =*/true);
+if (EC)
+  Diag(clang::diag::err_creating_default_module_cache_path)
+  << Path << EC.message();
+
+StringRef Name = 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-20 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3870064 , @ChuanqiXu wrote:

> I grepped `options.td` and got (incomplete) list for options to take a output 
> name:
>
>   # -o and its alias
>   -o
>   -object_file_name=
>   --output=
>   
>   /Fa (windows for assembly output filename)
>   /Fe (windows for output executable file name)
>   /Fi (windows for preprocessed output filename)
>   /Fo (Windows for object file)
>   
>   -dependency-dot (for DOT-formatted header dependencies)
>   -dependency-file (to write dependency output to)
>   
>   -header-include-file (Filename to write header include output to)
>   
>   -opt-record-file (Filename to use for YAML optimization record output)
>   
>   -split-dwarf-output (Filename to use for split dwarf debug info output)
>   
>   -stack-usage-file (to write stack usage output to)
>   -coverage-data-file (Emit coverage data to this filename)
>   -coverage-notes-file (Emit coverage notes to this filename)
>
> And it looks like the `-file` appears a lot. So may be the suggestion 
> (`-fc++-module-file-output`) may be better. And for the default location, I 
> feel like my explanation above makes sense. If the end user wants to produce 
> .pcm files, they can use `--precompile` just like what they do with `-c` to 
> get the object files. This only matters with end users since the build 
> systems should/would chose other positions.

OK. I guess the idea about `-fmodule-file==filename` was that, because 
the FE will not try to read `filename` (for module-generation cases) we could 
use it to describe the output file.  However, it seems that might be too 
complex... so `-fc++-module-file-output` seems OK to me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

I grepped `options.td` and got (incomplete) list for options to take a output 
name:

  # -o and its alias
  -o
  -object_file_name=
  --output=
  
  /Fa (windows for assembly output filename)
  /Fe (windows for output executable file name)
  /Fi (windows for preprocessed output filename)
  /Fo (Windows for object file)
  
  -dependency-dot (for DOT-formatted header dependencies)
  -dependency-file (to write dependency output to)
  
  -header-include-file (Filename to write header include output to)
  
  -opt-record-file (Filename to use for YAML optimization record output)
  
  -split-dwarf-output (Filename to use for split dwarf debug info output)
  
  -stack-usage-file (to write stack usage output to)
  -coverage-data-file (Emit coverage data to this filename)
  -coverage-notes-file (Emit coverage notes to this filename)

And it looks like the `-file` appears a lot. So may be the suggestion 
(`-fc++-module-file-output`) may be better. And for the default location, I 
feel like my explanation above makes sense. If the end user wants to produce 
.pcm files, they can use `--precompile` just like what they do with `-c` to get 
the object files. This only matters with end users since the build systems 
should/would chose other positions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

Yeah, I am a little bit surprising/confusing to see why we're talking about the 
**behavior** of `-fmodule-file`. I guess the intention of @iains is to throw 
the option to prove `-fc++-module-file-path` (or `-fc++-module-file-output`) is 
the better. The point here is about the `name` instead of the `behavior`.

In D134267#3869692 , @iains wrote:

> In D134267#3869678 , @dblaikie 
> wrote:
>
>> (I'd still sort of lean towards "make it the same as the .o, but with the 
>> .pcm suffix" and if the build system wants to put things in other places, it 
>> can move them around - but I understand why that's not something everyone's 
>> on board with)
>
> Actually, I would think that to be a great default (and put it in the CWD 
> just like the object),
> .. but it seems (in addition to the default) we do need a way to place 
> specific module files - and I could imagine needing to make different places 
> for different command line option sets.  Asking the driver or build system to 
> move the file seems like spawning more processes and we'd still need a 
> user-visible way to say where we wanted the file to be put.

For the default place, my original intention is to do not pollute (I know this 
may not be the best word) the user space. For example, when we compile a hello 
example (without modules):

  clang++ hello.cpp -o hello

we won't see `hello.o` in the current directory. It lives in the `/tmp` 
directory. I guess the intuition is to avoid users see things that they can 
ignore. So I chose to set the default place to be the same with the clang 
modules (under `~/cache/clang/module...`).

And for the needs to making different plans for different command line option 
sets, I feel like the current design (`-fc++-modules-cache-path` and 
`-fc++-module-bmi-output` (or `-fc++-module-file-output`)) is capable to handle 
it. For example, the different command line can set different 
`-fc++-modules-cache-path` or different `-fc++-module-file-output`.

>> Yeah, if the `-fmodule-file` syntax is currently only for input I don't 
>> think we need to try to overload it for output as well.
>
> My understanding [which could be flawed] is that the second syntax (which 
> already exists, it's not an invention of this discussion) is intended to 
> allow specification of output files.

No. (If I don't misunderstand your point), the `-fmodule-file` is only about 
the inputs. The difference between `-fmodule-file=a.pcm` and 
`-fmodule-file=A=a.pcm` is about the **lazy** module loading. For the first 
syntax, the clang will try to load `a.pcm` at the start. But for the second 
one, the clang will only try to load `a.pcm` when the module `A` is required. 
And I can't see the relationship with the output files.

>> Perhaps we could extend the --precompile flag to take a filename? (but then 
>> that'd potentially confuse things when you want to produce both PCM and .o, 
>> passing `--precompile` currently only produces the .pcm on the main output 
>> path... ).
>
> yeah, --precompile is well-defined to produce only the module (and I have a 
> patch to alias it to -fmodule-only to match the GCC equivalent), it would not 
> help to conflate that with the current objective.

Agreed. It may be not good to change the semantics of an existing option.

>> Anyone got a list of other compiler flags that take output file names? (I 
>> guess profile generation, maybe some other flags do this? Might be worth 
>> comparing/contrasting/seeking inspiration from those? I guess `-save-temps` 
>> uses the "same path, different suffix" strategy)
>
> I'll see if I can find my master list of modules options (tomorrow .. erm 
> later today).

I guess @dblaikie is talking about the general output flags (not limited to 
modules). I'll try to make a list today too.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3869678 , @dblaikie wrote:

> In D134267#3869643 , @iains wrote:
>
>> In D134267#3869614 , @dblaikie 
>> wrote:
>>
>>> In D134267#3869162 , @iains wrote:
>>>
 In D134267#3868830 , @dblaikie 
 wrote:

> I'm OK with sticking with the existing `-fmodule-file` if that works for 
> everyone. Yeah, it's short and ambiguous in a space with many concepts of 
> what a "module file" is, but also the fact that it's a `-f` flag might 
> help disambiguate it a bit - it's probably not the way anyone would 
> think/expect to be passing source files, those just get passed without 
> flags on the command line. And any use of it will show the .pcm extension 
> or whatever that should make it clear enough what's going on.

 hmm (I realise I mentioned this, and hope it has not complicated things) ..

 .. I was thinking of the `-fmodule-file==filename` variant.  The 
 problem with using it without (the ) is that -fmodule-file= can (and 
 does) appear multiple times on the command line to specify dependent 
 modules - so that one would have to specify which (named) module was 
 related to the filename.
>>
>>
>>
>>> I /think/ currently `-fmodule-file` works with Clang header modules 
>>> presumably because the module files contain within them enough information 
>>> to enable the compiler to substitute the module for an include where 
>>> necessary? I'm not sure the `=` part is necessary, as such, if the 
>>> BMIs can reasonably contain enough identifying info to make their use clear 
>>> to the consumer.
>>
>> `-fmodule-file=` works with both `clang` modules  and standard C++ ones.  
>> What it does (as of now) is cause the FE to open the named file and to 
>> pre-load the module name - so that when the FE tries to find a module the 
>> lookup mechanism can associate the module name with the module data.  So, if 
>> we need 3 dependent modules to be available when building a new source .. we 
>> would specify the PCM files containing those modules.
>>
>> At present, if you try to specify a non-existent PCM using `-fmodule-file=` 
>> the compile will fail early on (it is expecting this to indicate a source 
>> module, not a destination one).
>>
>> However, the second syntax `-fmodule-file==filename` I think should be 
>> able to work around this (since it says that the named module  is 
>> associated with the PCM `filename` which would allow us to cater for that 
>> file being missing (when we are creating it).
>
> Would the intent be that this might write out multiple different module 
> files? I would hope we don't need to support that - that we only generate the 
> module file for the single source module passed to the compiler. I guess 
> maybe to support `clang++ x.cppm y.cppm` you might have multiple output 
> files? But in that case you don't get to use `-o` so maybe you don't get to 
> specify the module paths either and just get the default "same as .o but with 
> .pcm suffix" behavior?
>
>> Does that clarify at all ?
>>
>> It would be great not to add more modules options flags, there are already 
>> way to many :/ - but if this seems too complex then one of the spellings 
>> suggested by @ChuanqiXu would work.
>
> Oh, right, super sorry - this is about how to specify the output filename. OK.
>
> (I'd still sort of lean towards "make it the same as the .o, but with the 
> .pcm suffix" and if the build system wants to put things in other places, it 
> can move them around - but I understand why that's not something everyone's 
> on board with)

Actually, I would think that to be a great default (and put it in the CWD just 
like the object),
.. but it seems (in addition to the default) we do need a way to place specific 
module files - and I could imagine needing to make different places for 
different command line option sets.  Asking the driver or build system to move 
the file seems like spawning more processes and we'd still need a user-visible 
way to say where we wanted the file to be put.

> Yeah, if the `-fmodule-file` syntax is currently only for input I don't think 
> we need to try to overload it for output as well.

My understanding [which could be flawed] is that the second syntax (which 
already exists, it's not an invention of this discussion) is intended to allow 
specification of output files.

> Perhaps we could extend the --precompile flag to take a filename? (but then 
> that'd potentially confuse things when you want to produce both PCM and .o, 
> passing `--precompile` currently only produces the .pcm on the main output 
> path... ).

yeah, --precompile is well-defined to produce only the module (and I have a 
patch to alias it to -fmodule-only to match the GCC equivalent), it would not 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D134267#3869643 , @iains wrote:

> In D134267#3869614 , @dblaikie 
> wrote:
>
>> In D134267#3869162 , @iains wrote:
>>
>>> In D134267#3868830 , @dblaikie 
>>> wrote:
>>>
 I'm OK with sticking with the existing `-fmodule-file` if that works for 
 everyone. Yeah, it's short and ambiguous in a space with many concepts of 
 what a "module file" is, but also the fact that it's a `-f` flag might 
 help disambiguate it a bit - it's probably not the way anyone would 
 think/expect to be passing source files, those just get passed without 
 flags on the command line. And any use of it will show the .pcm extension 
 or whatever that should make it clear enough what's going on.
>>>
>>> hmm (I realise I mentioned this, and hope it has not complicated things) ..
>>>
>>> .. I was thinking of the `-fmodule-file==filename` variant.  The 
>>> problem with using it without (the ) is that -fmodule-file= can (and 
>>> does) appear multiple times on the command line to specify dependent 
>>> modules - so that one would have to specify which (named) module was 
>>> related to the filename.
>
>
>
>> I /think/ currently `-fmodule-file` works with Clang header modules 
>> presumably because the module files contain within them enough information 
>> to enable the compiler to substitute the module for an include where 
>> necessary? I'm not sure the `=` part is necessary, as such, if the 
>> BMIs can reasonably contain enough identifying info to make their use clear 
>> to the consumer.
>
> `-fmodule-file=` works with both `clang` modules  and standard C++ ones.  
> What it does (as of now) is cause the FE to open the named file and to 
> pre-load the module name - so that when the FE tries to find a module the 
> lookup mechanism can associate the module name with the module data.  So, if 
> we need 3 dependent modules to be available when building a new source .. we 
> would specify the PCM files containing those modules.
>
> At present, if you try to specify a non-existent PCM using `-fmodule-file=` 
> the compile will fail early on (it is expecting this to indicate a source 
> module, not a destination one).
>
> However, the second syntax `-fmodule-file==filename` I think should be 
> able to work around this (since it says that the named module  is 
> associated with the PCM `filename` which would allow us to cater for that 
> file being missing (when we are creating it).

Would the intent be that this might write out multiple different module files? 
I would hope we don't need to support that - that we only generate the module 
file for the single source module passed to the compiler. I guess maybe to 
support `clang++ x.cppm y.cppm` you might have multiple output files? But in 
that case you don't get to use `-o` so maybe you don't get to specify the 
module paths either and just get the default "same as .o but with .pcm suffix" 
behavior?

> Does that clarify at all ?
>
> It would be great not to add more modules options flags, there are already 
> way to many :/ - but if this seems too complex then one of the spellings 
> suggested by @ChuanqiXu would work.

Oh, right, super sorry - this is about how to specify the output filename. OK.

(I'd still sort of lean towards "make it the same as the .o, but with the .pcm 
suffix" and if the build system wants to put things in other places, it can 
move them around - but I understand why that's not something everyone's on 
board with)

Yeah, if the `-fmodule-file` syntax is currently only for input I don't think 
we need to try to overload it for output as well.

Perhaps we could extend the --precompile flag to take a filename? (but then 
that'd potentially confuse things when you want to produce both PCM and .o, 
passing `--precompile` currently only produces the .pcm on the main output 
path... ).

Anyone got a list of other compiler flags that take output file names? (I guess 
profile generation, maybe some other flags do this? Might be worth 
comparing/contrasting/seeking inspiration from those? I guess `-save-temps` 
uses the "same path, different suffix" strategy)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3869614 , @dblaikie wrote:

> In D134267#3869162 , @iains wrote:
>
>> In D134267#3868830 , @dblaikie 
>> wrote:
>>
>>> I'm OK with sticking with the existing `-fmodule-file` if that works for 
>>> everyone. Yeah, it's short and ambiguous in a space with many concepts of 
>>> what a "module file" is, but also the fact that it's a `-f` flag might help 
>>> disambiguate it a bit - it's probably not the way anyone would think/expect 
>>> to be passing source files, those just get passed without flags on the 
>>> command line. And any use of it will show the .pcm extension or whatever 
>>> that should make it clear enough what's going on.
>>
>> hmm (I realise I mentioned this, and hope it has not complicated things) ..
>>
>> .. I was thinking of the `-fmodule-file==filename` variant.  The 
>> problem with using it without (the ) is that -fmodule-file= can (and 
>> does) appear multiple times on the command line to specify dependent modules 
>> - so that one would have to specify which (named) module was related to the 
>> filename.



> I /think/ currently `-fmodule-file` works with Clang header modules 
> presumably because the module files contain within them enough information to 
> enable the compiler to substitute the module for an include where necessary? 
> I'm not sure the `=` part is necessary, as such, if the BMIs can 
> reasonably contain enough identifying info to make their use clear to the 
> consumer.

`-fmodule-file=` works with both `clang` modules  and standard C++ ones.  What 
it does (as of now) is cause the FE to open the named file and to pre-load the 
module name - so that when the FE tries to find a module the lookup mechanism 
can associate the module name with the module data.  So, if we need 3 dependent 
modules to be available when building a new source .. we would specify the PCM 
files containing those modules.

At present, if you try to specify a non-existent PCM using `-fmodule-file=` the 
compile will fail early on (it is expecting this to indicate a source module, 
not a destination one).

However, the second syntax `-fmodule-file==filename` I think should be 
able to work around this (since it says that the named module  is 
associated with the PCM `filename` which would allow us to cater for that file 
being missing (when we are creating it).

Does that clarify at all ?

It would be great not to add more modules options flags, there are already way 
to many :/ - but if this seems too complex then one of the spellings suggested 
by @ChuanqiXu would work.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D134267#3869162 , @iains wrote:

> In D134267#3868830 , @dblaikie 
> wrote:
>
>> I'm OK with sticking with the existing `-fmodule-file` if that works for 
>> everyone. Yeah, it's short and ambiguous in a space with many concepts of 
>> what a "module file" is, but also the fact that it's a `-f` flag might help 
>> disambiguate it a bit - it's probably not the way anyone would think/expect 
>> to be passing source files, those just get passed without flags on the 
>> command line. And any use of it will show the .pcm extension or whatever 
>> that should make it clear enough what's going on.
>
> hmm (I realise I mentioned this, and hope it has not complicated things) ..
>
> .. I was thinking of the `-fmodule-file==filename` variant.  The 
> problem with using it without (the ) is that -fmodule-file= can (and 
> does) appear multiple times on the command line to specify dependent modules 
> - so that one would have to specify which (named) module was related to the 
> filename.

I'm really not following, sorry (which is a statement I could put on almost 
every discussion about modules build implications, to be honest - as much as 
it's something I've been involved with for years - so it's not your fault, as 
such, just that this stuff is hard, I think)

I /think/ currently `-fmodule-file` works with Clang header modules presumably 
because the module files contain within them enough information to enable the 
compiler to substitute the module for an include where necessary? I'm not sure 
the `=` part is necessary, as such, if the BMIs can reasonably contain 
enough identifying info to make their use clear to the consumer.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3869520 , @tahonermann 
wrote:

>> In a pre-scanned world, the build system does know the info for each source 
>> file (published and dependent modules) [which ought to dispel some of the 
>> concerns raised about not knowing about possible outputs for 
>> implementation/interface cases].
>>
>> In a discovery world, the interface to the build system carries all of this 
>> traffic anyway so that the command line would only be providing pre-set data 
>> for that.
>
> This does not cover all build systems. For example, Coverity relies on 
> observing the compiler invocations performed by another build system and 
> relies on interpreting the command lines of those invocations in order to 
> identify all inputs and outputs (and Coverity does require awareness of 
> outputs as well as inputs). Other tools that operate on a compilation 
> database or monitors like Build EAR  have 
> similar requirements.

I did not intend to derail the discussion of how the command line option should 
be spelled - but for clarity - I continue to expect that we will need to 
support **both** styles of build system, and the user will need to choose what 
is appropriate for their workflow - nothing in the proposals here impacts on 
that, right?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Tom Honermann via Phabricator via cfe-commits
tahonermann added a comment.

> In a pre-scanned world, the build system does know the info for each source 
> file (published and dependent modules) [which ought to dispel some of the 
> concerns raised about not knowing about possible outputs for 
> implementation/interface cases].
>
> In a discovery world, the interface to the build system carries all of this 
> traffic anyway so that the command line would only be providing pre-set data 
> for that.

This does not cover all build systems. For example, Coverity relies on 
observing the compiler invocations performed by another build system and relies 
on interpreting the command lines of those invocations in order to identify all 
inputs and outputs (and Coverity does require awareness of outputs as well as 
inputs). Other tools that operate on a compilation database or monitors like 
Build EAR  have similar requirements.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3868830 , @dblaikie wrote:

> I'm OK with sticking with the existing `-fmodule-file` if that works for 
> everyone. Yeah, it's short and ambiguous in a space with many concepts of 
> what a "module file" is, but also the fact that it's a `-f` flag might help 
> disambiguate it a bit - it's probably not the way anyone would think/expect 
> to be passing source files, those just get passed without flags on the 
> command line. And any use of it will show the .pcm extension or whatever that 
> should make it clear enough what's going on.

hmm (I realise I mentioned this, and hope it has not complicated things) ..

.. I was thinking of the `-fmodule-file==filename` variant.  The problem 
with using it without (the ) is that -fmodule-file= can (and does) appear 
multiple times on the command line to specify dependent modules - so that one 
would have to specify which (named) module was related to the filename.

In a pre-scanned world, the build system does know the info for each source 
file (published and dependent modules) [which ought to dispel some of the 
concerns raised about not knowing about possible outputs for 
implementation/interface cases].

In a discovery world, the interface to the build system carries all of this 
traffic anyway so that the command line would only be providing pre-set data 
for that.

.. and we do not care about header units in this discussion since they have to 
be handled specially anyway.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-19 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

I'm OK with sticking with the existing `-fmodule-file` if that works for 
everyone. Yeah, it's short and ambiguous in a space with many concepts of what 
a "module file" is, but also the fact that it's a `-f` flag might help 
disambiguate it a bit - it's probably not the way anyone would think/expect to 
be passing source files, those just get passed without flags on the command 
line. And any use of it will show the .pcm extension or whatever that should 
make it clear enough what's going on.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

> once again we are getting off topic for this patch :)

Yeah, let's wait for @dblaikie's opinion on the flag name : )

The remained question: To use `-fc++-module-bmi-output=` or 
`-fc++-module-filename=`, or anything else (maybe `-fc++-module-output=`)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

once again we are getting off topic for this patch :)

We (compiler engineers) will have to cater for multiple models used by 
different build systems.

SG15 might give guidance/recommendations,  but in the end the standard's 
normative text is not likely to make a 'discovery-based' scheme like build2 
non-conforming.
So, course, it is possible to make a GCC bugzilla requesting that it is 
possible to give a module output filename as a command line option (hopefully 
using the same option spelling as clang).  Allowing both compilers to operate 
with each other's [C++20] command lines is a great objective ..


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3864416 , @ruoso wrote:

> Currently, no. I think we do need a paper to discuss the requirements of the 
> remote execution protocol and how they relate to the implementation of C++ 
> modules.
>
> The simple summary is that one way that remote execution is implemented is by 
> just wrapping the compiler execution, creating a Merkle tree with all the 
> inputs, identify all outputs, ship that to a remote worker, and return 
> another Merkle tree with the outputs.

Yeah, it is always better to have standard protocols. So the current state 
about the suffixes of modules (or `a differently named argument for output 
files` in your terms) is still need to be discussed. Personally, I agree with 
the special suffix for the sake of readability. @iains I think your draft 
patches may be better to be suspended until the SG15 get consensus. From my 
understanding, the problem may not be related with the client/server modes, 
right? So I guess this may not block your future works.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Daniel Ruoso via Phabricator via cfe-commits
ruoso added a comment.

Currently, no. I think we do need a paper to discuss the requirements of the 
remote execution protocol and how they relate to the implementation of C++ 
modules.

The simple summary is that one way that remote execution is implemented is by 
just wrapping the compiler execution, creating a Merkle tree with all the 
inputs, identify all outputs, ship that to a remote worker, and return another 
Merkle tree with the outputs.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3864368 , @ruoso wrote:

> FWIW, having a differently named argument for output files and input files is 
> very important for tooling.
>
> In particular, in the case of remote execution, it makes a huge difference 
> being able to tell what are the inputs and outputs by looking at the 
> arguments alone, so I would like to make the case for an explicit output 
> argument.
>
> In fact, this is something I'll request on GCC as well, since it currently 
> uses the module mapper to figure out both the input and the output BMI files.

Is this a consensus of SG15?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Daniel Ruoso via Phabricator via cfe-commits
ruoso added a comment.

FWIW, having a differently named argument for output files and input files is 
very important for tooling.

In particular, in the case of remote execution, it makes a huge difference 
being able to tell what are the inputs and outputs by looking at the arguments 
alone, so I would like to make the case for an explicit output argument.

In fact, this is something I'll request on GCC as well, since it currently uses 
the module mapper to figure out both the input and the output BMI files.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3864248 , @iains wrote:

> I am also OK with doing this in two steps (first in the driver with this 
> patch and then by updating the FE to allow the two outputs from one 
> invocation - my draft patch series).

For your draft patches, I have only one concern in the high level: if we 
will/should restrict the module declarations appear in the filenames with 
special suffixes. Since both @rsmith and MSVC made the same decision. I am not 
sure if they have special reasons to do so. Maybe I need to search the 
discussion in SG15 mailing lists. And another related concern (maybe concern is 
not a good word here) is, if your patches landed, many existing codes may need 
to be removed. Otherwise, it'll be redundant codes. But this might not be a 
blocking issue though.

> BTW: I did mean to ask before .,, did you consider this (existing) command 
> syntax?
>
> `-fmodule-file=[=]`
>
> and see if it works for your case? (it seems that it should to be consistent).

Oh, do you mean we should use `module-file` name since we've used 
`-fmodule-file` option? Good point. Yeah, it looks like a pity to have 2 terms 
to describe the same thing. Personally I don't have strong feeling for the 
option name. If @dblaikie has no other comments on this, I'll follow your 
suggestion.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

> Yes, all of this stuff is important (and, yes, we are drifting into other 
> [related] topics) - we already have a mechanism for providing function bodies 
> to optimisation [LTO] - I do not think we should want to make module 
> interfaces larger than necessary to duplicate that functionality (the reason 
> we do now is because all the information needs to be present to feed into 
> codegen) - it has been a long-term objective (I think listed even in @rsmith 
> 's modules TODO list) to remove the unneeded BMI content.

I remembered the @rsmith's TODO list was about the the discarding entities in 
the GMF (I just found that I forgot to review your patches.). I am not sure if 
I mis remembered. And I agree it'll be great to reduce the size of module 
interfaces. I've heard the concern from Mathias that the size of BMI may affect 
the distributed building. Also in our internal practicing, we also use ThinLTO 
and we'll tuning ThinLTO for better performance. **BUT** from the general 
perspective of users, LTO (not ThinLTO) is not used widely. I only see people 
use it in cases where they need a score to show the performance or in some 
kernel cases. It is not common. Also from our simple experience, Dropping 
function bodies from module interface and ThinLTO will get worse performance 
than original ThinLTO. This is a performance regression too. I mean it'll be a 
drastic change and so it'll be a longer-term objective.

> Actually, my point was meant to be quite simple and directly related to this 
> patch that the choice of options name should be something that would be 
> obvious to the end-user, ideally we should have related options named 
> similarly and we should avoid using terms that are familiar to compiler 
> engineers but not necessarily to the end user.

Yeah, agreed. But as I said,  I originally uses the term `module file`. But 
@ruoso suggested the term `BMI`. So now in 
https://clang.llvm.org/docs/StandardCPlusPlusModules.html#built-module-interface-file,
 we uses the term `BMI` to end users and module-file is not a defined term 
now.. So it may be better to use `BMI` than `module-file` from the perspective.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

I am also OK with doing this in two steps (first in the driver with this patch 
and then by updating the FE to allow the two outputs from one invocation - my 
draft patch series).

BTW: I did mean to ask before .,, did you consider this (existing) command 
syntax?

`-fmodule-file=[=]`

and see if it works for your case? (it seems that it should to be consistent).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-18 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3863938 , @ChuanqiXu wrote:

>>> Although modular code is user-facing - BMIs are an implementational detail, 
>>> right?
>>
>> but I don't think BMIs are an implementation detail, anymore than object 
>> files are - users should/will be as aware of BMIs as they are of .o files - 
>> there build artifacts that need to be cached, can be out of date, can be 
>> deleted/maybe copied/moved around in limited ways, have to be passed to 
>> other tools in the build pipeline. Not sure what term to use, I guess 
>> compilers don't often have warnings/errors/diagnostics that explicitly talk 
>> about "object files" - so maybe in a similar sense they won't explicitly 
>> talk about "binary module artifact files" but maybe?
>
> Yeah, agreed. I guess @iains 's meaning may be "if the tool chains are ready 
> enough one day, then the end user won't need to touch BMIs directly". Yeah, 
> it looks true. Nowadays, many c++ programmers don't need to touch the `.o` 
> files since the build systems have taken care for it (from my personal 
> experience). But from the perspective of a compiler, the concept of `BMI` may 
> be necessary. I remember when I wrote the documents for modules, I used the 
> term "module file" at first. Then @ruoso corrected me that we should use 
> `BMI` here. So I think the use of the term `BMI` may be correct here.
>
>> The build system still needs to know that B.cppm depends on A.cppm - and 
>> once it knows that, it's not a huge cost for it to know the name of the file 
>> that represents that dependency and is produced by A.cppm and passed to 
>> B.cppm, I think?
>
> In a private chat with Kitware guys, they told me if the one phase 
> compilation wasn't supported, they can mock it by replacing the original 
> command by the combination of:
>
>   clang++ -std=c++20 --precompile src.cppm -o src.pcm
>   clang++ -std=c++20 src.pcm --precompile src.o
>
> but the `pcm` files won't be depended directly. So it may be harder, I am not 
> sure. @ben.boeckel any update?
>
>> In short - seems like we should separate out the cache discussion from the 
>> "one phase" compilation in the sense of a single build action that takes a 
>> .cppm and generates both a .o and a .pcm in one compiler/driver invocation. 
>> (maybe something like this is what @iains has already sent out in another 
>> review?)
>
> Agreed.
>
>> to @iains point about "it'd be good if we didn't have to invoke two 
>> underlying commands from the one drivter invocation" - yeah, agreed. Though 
>> I wouldn't mind one step being "add the driver interface" and another being 
>> "fix whatever serialization isuse/etc/ might stand in the way of doing 
>> .cppm->{.o,.pcm} in a single action without serialization, so we can then 
>> start stripping stuff out of the .pcm since it'll only need to contain the 
>> interface, and not have to worry about having enough info for .o generation 
>> anymore"
>
> (I guess you're saying `without deserialization`)
>
> Agreed in the higher level.
>
> But what do you mean about `stripping stuff out of the .pcm`? Do you mean to 
> remove some function bodies when writing the BMI? If yes, it'll be 
> problematic.  When we did so, when other TU imports the BMI, it won't see the 
> function bodies it could see before. This will prevent optimization. It'll be 
> good as an optional flag so that the user know what the effect is. But it 
> might not be good to  do so by default. (Although this paragraph starts to 
> talk about other topics)



- Yes, all of this stuff is important (and, yes, we are drifting into other 
[related] topics) - we already have a mechanism for providing function bodies 
to optimisation [LTO] - I do not think we should want to make module interfaces 
larger than necessary to duplicate that functionality  (the reason we do now is 
because all the information needs to be present to feed into codegen) - it has 
been a long-term objective (I think listed even in @rsmith 's modules TODO 
list) to remove the unneeded BMI content.

- Actually, my point was meant to be quite simple and directly related to this 
patch that the choice of options name should be something that would be obvious 
to the end-user, ideally we should have related options named similarly and we 
should avoid using terms that are familiar to compiler engineers but not 
necessarily to the end user.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-17 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added subscribers: ruoso, ben.boeckel.
ChuanqiXu added a comment.

>> Although modular code is user-facing - BMIs are an implementational detail, 
>> right?
>
> but I don't think BMIs are an implementation detail, anymore than object 
> files are - users should/will be as aware of BMIs as they are of .o files - 
> there build artifacts that need to be cached, can be out of date, can be 
> deleted/maybe copied/moved around in limited ways, have to be passed to other 
> tools in the build pipeline. Not sure what term to use, I guess compilers 
> don't often have warnings/errors/diagnostics that explicitly talk about 
> "object files" - so maybe in a similar sense they won't explicitly talk about 
> "binary module artifact files" but maybe?

Yeah, agreed. I guess @iains 's meaning may be "if the tool chains are ready 
enough one day, then the end user won't need to touch BMIs directly". Yeah, it 
looks true. Nowadays, many c++ programmers don't need to touch the `.o` files 
since the build systems have taken care for it (from my personal experience). 
But from the perspective of a compiler, the concept of `BMI` may be necessary. 
I remember when I wrote the documents for modules, I used the term "module 
file" at first. Then @ruoso corrected me that we should use `BMI` here. So I 
think the use of the term `BMI` may be correct here.

> The build system still needs to know that B.cppm depends on A.cppm - and once 
> it knows that, it's not a huge cost for it to know the name of the file that 
> represents that dependency and is produced by A.cppm and passed to B.cppm, I 
> think?

In a private chat with Kitware guys, they told me if the one phase compilation 
wasn't supported, they can mock it by replacing the original command by the 
combination of:

  clang++ -std=c++20 --precompile src.cppm -o src.pcm
  clang++ -std=c++20 src.pcm --precompile src.o

but the `pcm` files won't be depended directly. So it may be harder, I am not 
sure. @ben.boeckel any update?

> In short - seems like we should separate out the cache discussion from the 
> "one phase" compilation in the sense of a single build action that takes a 
> .cppm and generates both a .o and a .pcm in one compiler/driver invocation. 
> (maybe something like this is what @iains has already sent out in another 
> review?)

Agreed.

> to @iains point about "it'd be good if we didn't have to invoke two 
> underlying commands from the one drivter invocation" - yeah, agreed. Though I 
> wouldn't mind one step being "add the driver interface" and another being 
> "fix whatever serialization isuse/etc/ might stand in the way of doing 
> .cppm->{.o,.pcm} in a single action without serialization, so we can then 
> start stripping stuff out of the .pcm since it'll only need to contain the 
> interface, and not have to worry about having enough info for .o generation 
> anymore"

(I guess you're saying `without deserialization`)

Agreed in the higher level.

But what do you mean about `stripping stuff out of the .pcm`? Do you mean to 
remove some function bodies when writing the BMI? If yes, it'll be problematic. 
 When we did so, when other TU imports the BMI, it won't see the function 
bodies it could see before. This will prevent optimization. It'll be good as an 
optional flag so that the user know what the effect is. But it might not be 
good to  do so by default. (Although this paragraph starts to talk about other 
topics)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-17 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D134267#3851479 , @ChuanqiXu wrote:

> In D134267#3849629 , @dblaikie 
> wrote:
>
>> To the original point I was making about implicit modules (which might've 
>> been my own confusion due to the term being brought up in D134269 
>> ):
>>
>> Implicit modules is the situation where the compiler, finding that it needs 
>> a module while compiling some usage, knows how to go out and spawn a new 
>> process to create that module and then cache it. The build system never 
>> knows about modules, their builds or their dependencies.
>>
>> This patch had some superficial similarities - specifically around having an 
>> on-disk cache of modules that the user/build system/etc invoking the 
>> compiler isn't necessarily aware of/managing/invalidating/observing/etc. But 
>> it does differ in an important way from implicit modules in that the 
>> compiler won't implicitly build modules - you still have to specify the 
>> modules and their usage as separate compilations in-order (ie: modules need 
>> to be explicitly built before their usage). I think that makes a big 
>> difference to this being feasible, at least in the small scale.
>
> Oh, now I got your point. It is caused by the imprecise name. My bad.
>
>> The remaining concern is that this feature should likely not be used by a 
>> build system - because it won't know the dependencies (or, if it does know 
>> the dependencies then the build system, not the compiler, should be managing 
>> the BMIs) & so won't know how to schedule things for maximum parallelism 
>> without incorrect ordering, and correct rebuilding of dependencies when 
>> necessary.
>
> I agree it won't reach the maximum parallelism. But I think it should be able 
> to rebuild correctly if the build system understands the dependencies between 
> module unit. For example, if B.cpp imports module A, and A is defined in 
> A.cppm. And when A.cppm changes, it will be fine if the build system will 
> compile A.cppm first and compile B.cpp then. I think this is achievable by 
> the build system. (For example, the P1689  
> proposal I'm working on). So the problem becomes a performance problem 
> instead of a correctness problem. So it looks not bad to me. I still feel it 
> is not good to make perfect as the enemy of better.

The build system still needs to know that B.cppm depends on A.cppm - and once 
it knows that, it's not a huge cost for it to know the name of the file that 
represents that dependency and is produced by A.cppm and passed to B.cppm, I 
think?

In short - seems like we should separate out the cache discussion from the "one 
phase" compilation in the sense of a single build action that takes a .cppm and 
generates both a .o and a .pcm in one compiler/driver invocation. (maybe 
something like this is what @iains has already sent out in another review?)

to @iains point about "it'd be good if we didn't have to invoke two underlying 
commands from the one drivter invocation" - yeah, agreed. Though I wouldn't 
mind one step being "add the driver interface" and another being "fix whatever 
serialization isuse/etc/ might stand in the way of doing .cppm->{.o,.pcm} in a 
single action without serialization, so we can then start stripping stuff out 
of the .pcm since it'll only need to contain the interface, and not have to 
worry about having enough info for .o generation anymore"


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-17 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

(sorry I've not been following all this as closely as I should - but I don't 
think BMIs are an implementation detail, anymore than object files are - users 
should/will be as aware of BMIs as they are of .o files - there build artifacts 
that need to be cached, can be out of date, can be deleted/maybe copied/moved 
around in limited ways, have to be passed to other tools in the build pipeline. 
Not sure what term to use, I guess compilers don't often have 
warnings/errors/diagnostics that explicitly talk about "object files" - so 
maybe in a similar sense they won't explicitly talk about "binary module 
artifact files" but maybe?)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-17 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3861615 , @ChuanqiXu wrote:

> Address @tschuett's opinion:
>
> - Use new introduced -fc++-module-cache-path= instead of -fc++-module-path to 
> avoid many logics about modules cache in clang modules.
> - Use new introduced `-fmodule-bmi-output=` instead of `-fmodule-name` for 
> the same reason. Also `-fmodule-bmi-output=` is more flexible since 
> `-fmodule-name` require its argument to be the same with the actual module 
> name.

I think the second one is too much "implementor-speak" as a user-facing option 
name,  perhaps keep it consistent with the first and make it something like 
"-fc++-module-filename=" (unless it can be a path - in which case 
-fc++-module-pathname=).

Although modular code is user-facing - BMIs are an implementational detail, 
right?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-17 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

@dblaikie @iains @tschuett @boris There are many opinions in this page.  They 
are about problems in the higher level. They are pretty important and helpful 
indeed. I'm really appreciated it. But most of them are not related to this 
diff itself and I don't think they are blocking issues of this diff.

This diff itself is a simple compilation invocation sugar to me (the term 
`compilation invocation sugar` is created by me. It is similar to the idea of 
`syntax sugar` in programming language. I am not sure if there is any 
existing/proper term). It is simple/cheap and it won't do anything harmful. And 
it is helpful for users to use modules and for implementors of build systems to 
do some quick POCs. For users, we can use modules by the help of this patch in 
the current build systems. The one of the example could be: 
https://reviews.llvm.org/D135507.  If we don't have this patch, I guess we can 
only make it by some complex `add_custom_*` things. And for the implementors of 
build systems, it should be helpful too. Although I guess all of us here know 
that the one-phase compilation can't get the maximum parallelism, don't make 
perfect to be enemy of better.

I really wish we can get this one accepted.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-17 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu updated this revision to Diff 468138.
ChuanqiXu added a comment.

Address @tschuett's opinion:

- Use new introduced `-fc++-module-cache-path=` instead of `-fc++-module-path` 
to avoid many logics about modules cache in clang modules.
- Use new introduced `-fmodule-bmi-output=` instead of `-fmodule-name` for the 
same reason. Also `-fmodule-bmi-output=` is more flexible since `-fmodule-name` 
require its argument to be the same with the actual module name.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticCommonKinds.td
  clang/include/clang/Driver/Options.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/create_module_cache.cpp
  clang/test/Modules/one-phase-compilation-named-modules.cppm

Index: clang/test/Modules/one-phase-compilation-named-modules.cppm
===
--- /dev/null
+++ clang/test/Modules/one-phase-compilation-named-modules.cppm
@@ -0,0 +1,26 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: %clang -std=c++20 %t/M.cppm -c -o - -fc++-modules-cache-path=%t/pcm.cache
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+//
+// RUN: rm -f %t/M.pcm
+//
+// RUN: %clang -std=c++20 %t/MismatchedName.cppm -c -o - -fmodule-bmi-output=%t/pcm.cache/M.pcm
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+
+//--- M.cppm
+export module M;
+export int getValue();
+
+//--- MismatchedName.cppm
+export module M;
+export int getValue();
+
+//--- Use.cpp
+// expected-no-diagnostics
+import M;
+int Use() {
+return getValue();
+}
Index: clang/test/Driver/create_module_cache.cpp
===
--- /dev/null
+++ clang/test/Driver/create_module_cache.cpp
@@ -0,0 +1,31 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: not %clang -std=c++20 %t/M.cppm -fc++-modules-cache-path= -c -o %t/M.tmp.o 2>&1 | FileCheck %t/M.cppm --check-prefix=ERROR
+// RUN: %clang -std=c++20 %t/M.cppm -fc++-modules-cache-path=%t/abc -c -o -
+// RUN: ls %t | FileCheck %t/M.cppm --check-prefix=CHECK-AVAILABLE
+// RUN: %clang -std=c++20 %t/M.cppm -fc++-modules-cache-path=%t/abc -fno-implicit-modules -c -o -
+// RUN: ls %t | FileCheck %t/M.cppm --check-prefix=CHECK-AVAILABLE
+//
+// RUN: %clang -std=c++20 %t/Use.cpp -fc++-modules-cache-path=abc -### 2>&1 | FileCheck %t/Use.cpp
+// RUN: %clang -std=c++20 %t/Use.cpp -fc++-modules-cache-path=abc -fno-implicit-modules -### 2>&1 | FileCheck %t/Use.cpp
+//
+// Check that the compiler will generate M-Part.pcm correctly.
+// RUN: %clang -std=c++20 %t/Part.cppm -fmodule-bmi-output=%t/M-Part.pcm -c -o -
+// RUN: ls %t | FileCheck %t/Part.cppm --check-prefix=CHECK-AVAILABLE
+
+//--- M.cppm
+export module M;
+
+// ERROR: unable to create default module cache path "": No such file or directory
+// CHECK-AVAILABLE: abc
+
+//--- Use.cpp
+import M;
+
+// CHECK: -fprebuilt-module-path=abc
+
+//--- Part.cppm
+export module M:Part;
+// CHECK-AVAILABLE: M-Part.pcm
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -3750,6 +3750,7 @@
   std::string("-fprebuilt-module-path=") + A->getValue()));
   A->claim();
 }
+
 if (Args.hasFlag(options::OPT_fprebuilt_implicit_modules,
  options::OPT_fno_prebuilt_implicit_modules, false))
   CmdArgs.push_back("-fprebuilt-implicit-modules");
@@ -3759,6 +3760,18 @@
   CmdArgs.push_back("-fvalidate-ast-input-files-content");
   }
 
+  // If we're in standard c++ modules, lookup in cache path automatically.
+  if (HaveStdCXXModules) {
+SmallString<128> Path;
+if (Arg *A = Args.getLastArg(options::OPT_fcxx_modules_cache_path))
+  Path = A->getValue();
+else
+  Driver::getDefaultModuleCachePath(Path);
+if (!Path.empty())
+  CmdArgs.push_back(Args.MakeArgString(
+  std::string("-fprebuilt-module-path=") + Path));
+  }
+
   // -fmodule-name specifies the module that is currently being built (or
   // used for header checking by -fmodule-maps).
   Args.AddLastArg(CmdArgs, options::OPT_fmodule_name_EQ);
Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -5558,6 +5558,30 @@
 return "-";
   }
 
+  if (isa(JA) && JA.getType() == types::TY_ModuleFile) {
+if (Arg *A = C.getArgs().getLastArg(options::OPT_fmodule_bmi_output))
+  return C.addResultFile(A->getValue(),  );
+
+SmallString<128> Path;
+if (Arg *A = C.getArgs().getLastArg(options::OPT_fcxx_modules_cache_path))
+  Path 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-13 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3854980 , @ChuanqiXu wrote:

> In D134267#3852136 , @boris wrote:
>
>>> For example, my experimental support for P1689 
>>>  is at: [...]. The implementation looks 
>>> relatively trivial to me. The simplicity is pretty important.



>> 2. I haven't looked at the code, but there are some complex problems in this 
>> area as highlighted by this MSVC bug: 
>> https://developercommunity.visualstudio.com/t/scanDependencies-does-not-take-into-acc/10029154
>>  I believe you may have it easier because reportedly Richard & friends have 
>> already implemented the necessary header import isolation semantics.
>
> Yeah, it looks like the header unit brings new complexity. And my demo only 
> works for named modules. I need to reconsider it.

Header Units are "special", they defeat automatic processing in my view.

- The compiler cannot determine that a source is a HU (from the header source); 
it has to be told.
- A build system can determine that a source **depends** on a header (of 
course, that is already done for years), however, it is not at all clear that 
the build system is able to determine if that dependent header is an 
"importable header" and therefore suitable for building as a HU.  Maybe the 
build system folks have some ideas, of course (and a sub-set of headers are 
pre-defined as importable).

So, I suspect, that we will be expecting (at least in the short to medium term) 
the user's project to define which headers should be converted to HUs;  we 
already have specific driver options to help with the actual production.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-13 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3852136 , @boris wrote:

>> For example, my experimental support for P1689 
>>  is at: [...]. The implementation looks 
>> relatively trivial to me. The simplicity is pretty important.
>
> Two points:
>
> 1. It's worth considering the simplicity of the overall system (compiler + 
> build system + user project) rather than just the compiler. I hope my 
> previous comment highlighted some of the complexity that the overall system 
> must deal with in the pre-scan approach with a lot of it potentially ending 
> up on the user's plate.
>
> 2. I haven't looked at the code, but there are some complex problems in this 
> area as highlighted by this MSVC bug: 
> https://developercommunity.visualstudio.com/t/scanDependencies-does-not-take-into-acc/10029154
>  I believe you may have it easier because reportedly Richard & friends have 
> already implemented the necessary header import isolation semantics.

Yeah, it looks like the header unit brings new complexity. And my demo only 
works for named modules. I need to reconsider it.

My original thought for these two modes was: the compiler could avoid to make 
the choice. I mean, the compiler could support both modes and let the build 
system writer and end user to make the choice.

And I believe there will be a discuss for client-server model vs pre-scan model 
in clang. I think we can discuss more then.

In D134267#3852883 , @iains wrote:

> To avoid more tangential discussion here - I've opened 
> https://discourse.llvm.org/t/generating-pcm-module-interfaces-and-regular-object-files-from-the-same-compiler-invocation/65918
>   ... it would be great if the major stakeholders here especially  @rsmith 
> could comment on the design intentions.

Yeah, this is about the deserialization part. I believe there'll be 2 other 
discussions about filename suffixes and client-module model. Although I feel 
none of them are blocker or even related to this patch : )


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-12 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

To avoid more tangential discussion here - I've opened 
https://discourse.llvm.org/t/generating-pcm-module-interfaces-and-regular-object-files-from-the-same-compiler-invocation/65918
  ... it would be great if the major stakeholders here especially  @rsmith 
could comment on the design intentions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-12 Thread Boris Kolpackov via Phabricator via cfe-commits
boris added a comment.

> For example, my experimental support for P1689 
>  is at: [...]. The implementation looks 
> relatively trivial to me. The simplicity is pretty important.

Two points:

1. It's worth considering the simplicity of the overall system (compiler + 
build system + user project) rather than just the compiler. I hope my previous 
comment highlighted some of the complexity that the overall system must deal 
with in the pre-scan approach with a lot of it potentially ending up on the 
user's plate.

2. I haven't looked at the code, but there are some complex problems in this 
area as highlighted by this MSVC bug: 
https://developercommunity.visualstudio.com/t/scanDependencies-does-not-take-into-acc/10029154
 I believe you may have it easier because reportedly Richard & friends have 
already implemented the necessary header import isolation semantics.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-12 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3852047 , @boris wrote:

> Just to add a few points on the module mapper discussion: overall, the 
> current view is that there are two ways to handle C++20 modules from the 
> build system's POV: (1) pre-scan the codebase to figure out what is what and 
> who depends on whom ahead of compilation and (2) module mapper where the 
> compiler and the build system discover this information interactively during 
> compilation. Naturally, both approaches have advantages and disadvantages.
>
> The major issue with the pre-scan is that there needs to be a clear boundary 
> where the codebase ends. This can be an acceptable requirement for 
> monorepo-style projects but not for multirepo. Even for monorepo, there are 
> often external dependencies (C/C++ standard library, maybe system libraries 
> like OpenSSL) that require special considerations. There is also some concern 
> for the performance of the pre-scan on larger codebases. On the other hand, 
> pre-scan is likely to be easier to integrate with legacy and meta-build 
> systems like CMake.
>
> The major issue with the module mapper is the need for deeper integration 
> with the build system as well as the overall complexity of the client/server 
> architecture. There is also some concern for the resource consumption since 
> the mapper approach may need to spawn nested compiler invocations to build 
> missing BMIs. The main advantage of the module mapper is probably the fact 
> that the information comes from the source (the compiler) and if/when 
> necessary, which sidesteps the whole issue of doing too little or too much 
> pre-scanning (for example, due to imprecise boundaries, some discrepancies in 
> the compiler options, etc).
>
> GCC implements the mapper approach (with the reusable parts factored into 
> libcody) and we successfully use that in build2. We would definitely like to 
> see the module mapper supported by Clang, ideally with the GCC's interface 
> (so that we could use the existing mapper for both). In the build2's case 
> specifically, pre-scan is not a viable option since it's a multirepo-first 
> build system.
>
> Also, a comment on the earlier point made:
>
>> I do not believe that the client-sever build system model (a.k.a. P1184 
>>  / module mapper) is quite the same thing as 
>> implicit modules
>
> Strongly agree. The key difference here is that with the mapper there is a 
> single entity (the build system) that decides which things need to be 
> (re)built and where rather than multiple compiler instances trying to come up 
> with something consistent.

Hi Boris, thanks for the update. (Although I feel this must not be the right 
place to discuss the topic).

Personally, I don't have strong feeling about the client-server model (pursue 
it or object it). And I'd love to support the per-scan based method since it is 
relatively easy for compiler and build systems. So that other build systems 
(including but not limited to cmake) can support modules more quickly. Then the 
more users can try to start to use modules actually.  So that then we can get 
more backport to improve. And I feel like the two option may not be mutual 
exclusive with each other. For example, my experimental support for P1689 
 is at: 
https://github.com/ChuanqiXu9/llvm-project/commit/fd809e8fcba497eb9458d7dbb6fc194044b19521.
 The implementation looks relatively trivial to me. The simplicity is pretty 
important. Then I won't feel it is too late to support the client-server model 
in clang if someday the client-server  model shows great advances.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-12 Thread Boris Kolpackov via Phabricator via cfe-commits
boris added a comment.

Just to add a few points on the module mapper discussion: overall, the current 
view is that there are two ways to handle C++20 modules from the build system's 
POV: (1) pre-scan the codebase to figure out what is what and who depends on 
whom ahead of compilation and (2) module mapper where the compiler and the 
build system discover this information interactively during compilation. 
Naturally, both approaches have advantages and disadvantages.

The major issue with the pre-scan is that there needs to be a clear boundary 
where the codebase ends. This can be an acceptable requirement for 
monorepo-style projects but not for multirepo. Even for monorepo, there are 
often external dependencies (C/C++ standard library, maybe system libraries 
like OpenSSL) that require special considerations. There is also some concern 
for the performance of the pre-scan on larger codebases. On the other hand, 
pre-scan is likely to be easier to integrate with legacy and meta-build systems 
like CMake.

The major issue with the module mapper is the need for deeper integration with 
the build system as well as the overall complexity of the client/server 
architecture. There is also some concern for the resource consumption since the 
mapper approach may need to spawn nested compiler invocations to build missing 
BMIs. The main advantage of the module mapper is probably the fact that the 
information comes from the source (the compiler) and if/when necessary, which 
sidesteps the whole issue of doing too little or too much pre-scanning (for 
example, due to imprecise boundaries, some discrepancies in the compiler 
options, etc).

GCC implements the mapper approach (with the reusable parts factored into 
libcody) and we successfully use that in build2. We would definitely like to 
see the module mapper supported by Clang, ideally with the GCC's interface (so 
that we could use the existing mapper for both). In the build2's case 
specifically, pre-scan is not a viable option since it's a multirepo-first 
build system.

Also, a comment on the earlier point made:

> I do not believe that the client-sever build system model (a.k.a. P1184 
>  / module mapper) is quite the same thing as 
> implicit modules

Strongly agree. The key difference here is that with the mapper there is a 
single entity (the build system) that decides which things need to be (re)built 
and where rather than multiple compiler instances trying to come up with 
something consistent.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a reviewer: tschuett.
ChuanqiXu added a comment.

In D134267#3849629 , @dblaikie wrote:

> To the original point I was making about implicit modules (which might've 
> been my own confusion due to the term being brought up in D134269 
> ):
>
> Implicit modules is the situation where the compiler, finding that it needs a 
> module while compiling some usage, knows how to go out and spawn a new 
> process to create that module and then cache it. The build system never knows 
> about modules, their builds or their dependencies.
>
> This patch had some superficial similarities - specifically around having an 
> on-disk cache of modules that the user/build system/etc invoking the compiler 
> isn't necessarily aware of/managing/invalidating/observing/etc. But it does 
> differ in an important way from implicit modules in that the compiler won't 
> implicitly build modules - you still have to specify the modules and their 
> usage as separate compilations in-order (ie: modules need to be explicitly 
> built before their usage). I think that makes a big difference to this being 
> feasible, at least in the small scale.

Oh, now I got your point. It is caused by the imprecise name. My bad.

> The remaining concern is that this feature should likely not be used by a 
> build system - because it won't know the dependencies (or, if it does know 
> the dependencies then the build system, not the compiler, should be managing 
> the BMIs) & so won't know how to schedule things for maximum parallelism 
> without incorrect ordering, and correct rebuilding of dependencies when 
> necessary.

I agree it won't reach the maximum parallelism. But I think it should be able 
to rebuild correctly if the build system understands the dependencies between 
module unit. For example, if B.cpp imports module A, and A is defined in 
A.cppm. And when A.cppm changes, it will be fine if the build system will 
compile A.cppm first and compile B.cpp then. I think this is achievable by the 
build system. (For example, the P1689  proposal 
I'm working on). So the problem becomes a performance problem instead of a 
correctness problem. So it looks not bad to me. I still feel it is not good to 
make perfect as the enemy of better.

P.S: there is another case: after we built the program and we remove A.pcm 
without touching A.cppm. Then the current model can't hold it unless the build 
system can recognize the pcm actually. But this case won't make me too 
uncomfortable.

> In any case, it looks like there's design discussions to be had? Not sure if 
> this is the right place to have them, but maybe it is. It might be easier to 
> discuss them on discourse with hopefully some relatively narrow examples with 
> command lines shown, etc. (as already some of the conversation seems to be 
> fragmented between this change and the doc change)

Yeah, we can discuss it somewhere else.

@tschuett

> What really worries me that you are talking about one phase compilation and 
> there is a module cache. Even if it is one phase compilation you must pass 
> all dependent modules on the command line.

I thought the compiler can search in the module cache automatically. Do you 
mean the style is bad?

> The one phase compilation should work with -fno-implicit-modules!

Great catch! I think we can solve the problem by another option 
(`-fcxx-modules-cache-path` ?) to decouple with clang modules. Originally I 
feel it may be better to reuse the option to avoid confusion. But it looks like 
there are too much logics about modules cache in the clang modules.

> I do not think this patch fully addresses the issue of harmonising GCC and 
> clang's command lines for modular builds (because it does not deal with 
> discovery of modular code in 'normally named' sources), and my investigation 
> of trying to do this in the driver suggests that it would be much more 
> complex than doing it in the FE.

Yeah, it is not sufficient to say this patch addresses the command line 
conformance issue. This patch only ease the use of modules.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3849712 , @dblaikie wrote:

> In D134267#3849673 , @iains wrote:
>
>> 



>> - I do not think this patch fully addresses the issue of harmonising GCC and 
>> clang's command lines for modular builds (because it does not deal with 
>> discovery of modular code in 'normally named' sources), and my investigation 
>> of trying to do this in the driver suggests that it would be much more 
>> complex than doing it in the FE.
>
> Yes, that would be hard to implement in the driver - but my personal taste is 
> that if the compiler's going to treat the input differently, or produce 
> different outputs, it should be observable on the command line - either 
> through different flags (`-x`) or a different file extension. I think that's 
> been historically true? Though I realize it also comes up against "but we 
> have all this C++ with these existing file extensions" and especially if GCC 
> doesn't agree on that philosophy, Clang's going to have a hard time making a 
> stand there... :/

I think that ship has sailed, GCC does, indeed, already take an input like 
"foo.cxx" and automatically determine that it should emit a BMI as well as the 
.O and does so.

It is (IMO, at least) reasonable to forecast that there will be proponents of 
both "we should have extensions to disambiguate" and " the compiler should be 
able to work it out standardised C++ and not force us to rename files in a 
strange way".   The degree of support for either camp will no doubt include 
religious fervour ...

As compiler engineers, I'd say that regardless of which view we might take for 
our own code, we have to cater for both.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D134267#3849673 , @iains wrote:

> (trying not derail this discussion further)
>
> - Yes, the alternate solution proposed for the "Hello World" case would work 
> with the module mapper interface.  However, the initial discussion was simply 
> about being able to produce both the .PCM and .O artefacts from one 
> invocation of the FE.

*nod* If there's some other patch that does just that - produces a .o/.pcm 
together in a single invocation, maybe that's a less contentious/easier patch 
to review/discuss/approve/commit. The number of process invocations involved in 
that (whether the driver, under the hood, invokes clang once to generate a .pcm 
and another to generate the .o) I'm less fussed about - I think that's 
something that can be improved separately - especially once/if someone wants to 
change the .pcm format to be lighter weight (not enough to generate the .o 
from, but enough for compilations /using/ the module to do so) - that'll drive 
a need for a monolithic .cppm -> {.o,.pcm} action.

> - It seemed reasonable to mention (since it's not vapourware, there are draft 
> patches), but clearly some technical issues to address.

For sure.

> - I do not think this patch fully addresses the issue of harmonising GCC and 
> clang's command lines for modular builds (because it does not deal with 
> discovery of modular code in 'normally named' sources), and my investigation 
> of trying to do this in the driver suggests that it would be much more 
> complex than doing it in the FE.

Yes, that would be hard to implement in the driver - but my personal taste is 
that if the compiler's going to treat the input differently, or produce 
different outputs, it should be observable on the command line - either through 
different flags (`-x`) or a different file extension. I think that's been 
historically true? Though I realize it also comes up against "but we have all 
this C++ with these existing file extensions" and especially if GCC doesn't 
agree on that philosophy, Clang's going to have a hard time making a stand 
there... :/


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Thorsten via Phabricator via cfe-commits
tschuett added a comment.

The one phase compilation should work with `-fno-implicit-modules`!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Thorsten via Phabricator via cfe-commits
tschuett added a comment.

What really worries me that you are talking about one phase compilation and 
there is a module cache. Even if it is one phase compilation you must pass all 
dependent modules on the command line.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

(trying not derail this discussion further)

- Yes, the alternate solution proposed for the "Hello World" case would work 
with the module mapper interface.  However, the initial discussion was simply 
about being able to produce both the .PCM and .O artefacts from one invocation 
of the FE.
- It seemed reasonable to mention (since it's not vapourware, there are draft 
patches), but clearly some technical issues to address.
- I do not think this patch fully addresses the issue of harmonising GCC and 
clang's command lines for modular builds (because it does not deal with 
discovery of modular code in 'normally named' sources), and my investigation of 
trying to do this in the driver suggests that it would be much more complex 
than doing it in the FE.

In the discussions, we have uncovered some other things that should be 
discussed elsewhere too.
(so I agree any further discussion of the alternate interface can be moved 
elsewhere - either wait for the rebased patches or on discourse).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

To the original point I was making about implicit modules (which might've been 
my own confusion due to the term being brought up in D134269 
):

Implicit modules is the situation where the compiler, finding that it needs a 
module while compiling some usage, knows how to go out and spawn a new process 
to create that module and then cache it. The build system never knows about 
modules, their builds or their dependencies.

This patch had some superficial similarities - specifically around having an 
on-disk cache of modules that the user/build system/etc invoking the compiler 
isn't necessarily aware of/managing/invalidating/observing/etc. But it does 
differ in an important way from implicit modules in that the compiler won't 
implicitly build modules - you still have to specify the modules and their 
usage as separate compilations in-order (ie: modules need to be explicitly 
built before their usage). I think that makes a big difference to this being 
feasible, at least in the small scale.

The remaining concern is that this feature should likely not be used by a build 
system - because it won't know the dependencies (or, if it does know the 
dependencies then the build system, not the compiler, should be managing the 
BMIs) & so won't know how to schedule things for maximum parallelism without 
incorrect ordering, and correct rebuilding of dependencies when necessary.

So maybe we could limit the "Hello, World" situation only to cases where the 
compiler is producing temporary object files? I know this probably sounds 
absurd, but coming back to a point I made earlier about .pcm files being named 
with the same scheme as .o files - so if the .o file goes into the temp 
directory, the .pcm goes there in the same manner - the driver could, 
potentially, add that module (`-fmodule-file` or whatever is necessary) to all 
subsequent compilations on the same command line (eg: `clang++ x.cppm y.cppm 
z.cpp a.o` - so x.cppm gets built temporarily, y.cppm gets built temporarily 
and gets `-fmodule-file=/tmp/x.pcm` or whatever, then z.cpp gets built with 
`-fmodule-file=/tmp/x.pcm -fmodule-file=/tmp/y.pcm` and then the whole thing 
gets linked as usual)

This ensures the feature doesn't creep into build systems where it might be 
more confusing than helpful, given the implicit dependencies between build 
commands. It's admittedly going to be confusing to users that they can't 
trivially split their command line out into multiple, but I think that's 
probably the tipping point in complexity for modules builds where users should 
reach for a build system or live with the complexity of passing .pcm files 
explicitly between these steps.



Sorry I haven't been following the discussion between this and what @iains is 
proposing. I guess/roughly understand that @iains is proposing something like 
the module-mapper support? Which I rather like the idea of, though understand 
it has more compiler surface area/complex interaction with the mapping service.

I guess @iains is proposing that the mapper could be used to implement/address 
this "Hello, World" situation in some way? (I guess that looks maybe /more/ 
like implicit modules than this proposal - in that the module mapper does mean 
that during one compilation, another compilation might run - and for a 
build-system agnostic module mapper that boils down to the problems of build 
parallelism that implicit modules have (though it had a lot of success with 
that model, not having to modify build systems/being able to ship sooner/etc - 
it's been around for years, and as much as we've learned of its limitations, 
it's hard to argue with the success/suggest that it wasn't the right tradeoff 
(I have some reservations still, but that's more an social/after-work ramble 
than technical)))

In any case, it looks like there's design discussions to be had? Not sure if 
this is the right place to have them, but maybe it is. It might be easier to 
discuss them on discourse with hopefully some relatively narrow examples with 
command lines shown, etc. (as already some of the conversation seems to be 
fragmented between this change and the doc change)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3848974 , @ChuanqiXu wrote:

> In D134267#3848958 , @iains wrote:
>
>> In D134267#3848883 , @ChuanqiXu 
>> wrote:
>>
>>> In D134267#3848876 , @iains wrote:
>>>
 If we are going to **//require//** the serialisation/deserialisation for 
 correctness then IMO we should have an error for an incorrect compilation.
>>>
>>> Of course.
>>>
 I still feel that deserialisation should do deserialisation, and Sema 
 should be doing the diagnoses when it tries to use a decl (in fact, I 
 would expect that to be required for correctness in the C++20 scheme, 
 since the point of use is significant).  I will defer to folks who know 
 the history better - but this seems like a layering bug to me still.
>>>
>>> Another reason I forgot to mention is that the deserialization need to do 
>>> merging. It is common in practice that a similar declarations lives in 
>>> multiple module files (e.g., by GMF). Then if the deserialization needs to 
>>> merge things, it is naturally that the deserilizer need to do ODR checks. 
>>> But if you feel like the merging job belongs to Sema too, I think it might 
>>> not be bad/wrong.
>>
>> I would be concerned that doing this work in the deserialisation could:
>>
>> - produce false positives (diagnostics) where there should be no error until 
>> an object is actually used.
>> - make it more difficult to track cases where what is merged at the point of 
>> use is semantically significant.
>>
>> It would seem to better that the consumer of the AST should not have to know 
>> whether it had been round-tripped through the serialisation/deserialization 
>> process.
>
> The reading process of modules are lazy. So the problem is relatively 
> mitigated. But I understand your concern in some level.

(IMO) We ought to be able to move the ODR work to Sema (but I did not yet look 
at how hard this would be) - it seems that deserialisation should supply 
(lazily) decls in response to requests and the query side should determine if 
they are legally mergeable.  However, let's move this discussion elsewhere now 
- I think we've covered the main points.

>> From the point of view of this diff - it is relevant to consider other 
>> implementations that achieve the same objective - we seem to have uncovered 
>> some additional questions to answer.
>
> I feel the main blocking issue for this diff is the concern about modules 
> cache throw from @dblaikie and @tschuett that we (at least me) don't know the 
> reason. I'm still waiting for more detailed reason to understand them.
>
> And I feel like your opinion is mainly in the higher level. And I don't feel 
> it is blocking this patch. Because if we're going to refactor it actually 
> someday, this diff won't never be a blocker.

I think that is correct - the implementations are independent (my, so far 
unpublished, driver patches probably would require modification, but that's not 
a big deal).

> And under the current frame, I think this implementation is the simplest way 
> to achieve the same goal. So what we talked about is mainly about some higher 
> level things.

Sure, but having a high-level plan is important too :)

One specific comment about this patch; one of the key things that GCC does is 
to remove the need for the user to write a specific extension to use modular 
C++.  So from that point of view, at least, this patch does not provide a 
solution to match the command line interfaces of GCC.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3848958 , @iains wrote:

> In D134267#3848883 , @ChuanqiXu 
> wrote:
>
>> In D134267#3848876 , @iains wrote:
>>
>>> If we are going to **//require//** the serialisation/deserialisation for 
>>> correctness then IMO we should have an error for an incorrect compilation.
>>
>> Of course.
>>
>>> I still feel that deserialisation should do deserialisation, and Sema 
>>> should be doing the diagnoses when it tries to use a decl (in fact, I would 
>>> expect that to be required for correctness in the C++20 scheme, since the 
>>> point of use is significant).  I will defer to folks who know the history 
>>> better - but this seems like a layering bug to me still.
>>
>> Another reason I forgot to mention is that the deserialization need to do 
>> merging. It is common in practice that a similar declarations lives in 
>> multiple module files (e.g., by GMF). Then if the deserialization needs to 
>> merge things, it is naturally that the deserilizer need to do ODR checks. 
>> But if you feel like the merging job belongs to Sema too, I think it might 
>> not be bad/wrong.
>
> I would be concerned that doing this work in the deserialisation could:
>
> - produce false positives (diagnostics) where there should be no error until 
> an object is actually used.
> - make it more difficult to track cases where what is merged at the point of 
> use is semantically significant.
>
> It would seem to better that the consumer of the AST should not have to know 
> whether it had been round-tripped through the serialisation/deserialization 
> process.

The reading process of modules are lazy. So the problem is relatively 
mitigated. But I understand your concern in some level.

> From the point of view of this diff - it is relevant to consider other 
> implementations that achieve the same objective - we seem to have uncovered 
> some additional questions to answer.

I feel the main blocking issue for this diff is the concern about modules cache 
throw from @dblaikie and @tschuett that we (at least me) don't know the reason. 
I'm still waiting to more detailed reason to understand them.

And I feel like your opinion is mainly in the higher level. And I don't feel it 
is blocking this patch. Because if we're going to refactor it actually someday, 
this diff won't never be a blocker. And under the current frame, I think this 
implementation is the simplest way to achieve the same goal. So what we talked 
about is mainly about some higher level things.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3848883 , @ChuanqiXu wrote:

> In D134267#3848876 , @iains wrote:
>
>> If we are going to **//require//** the serialisation/deserialisation for 
>> correctness then IMO we should have an error for an incorrect compilation.
>
> Of course.
>
>> I still feel that deserialisation should do deserialisation, and Sema should 
>> be doing the diagnoses when it tries to use a decl (in fact, I would expect 
>> that to be required for correctness in the C++20 scheme, since the point of 
>> use is significant).  I will defer to folks who know the history better - 
>> but this seems like a layering bug to me still.
>
> Another reason I forgot to mention is that the deserialization need to do 
> merging. It is common in practice that a similar declarations lives in 
> multiple module files (e.g., by GMF). Then if the deserialization needs to 
> merge things, it is naturally that the deserilizer need to do ODR checks. But 
> if you feel like the merging job belongs to Sema too, I think it might not be 
> bad/wrong.

I would be concerned that doing this work in the deserialisation could:

- produce false positives (diagnostics) where there should be no error until an 
object is actually used.
- make it more difficult to track cases where what is merged at the point of 
use is semantically significant.

It would seem to better that the consumer of the AST should not have to know 
whether it had been round-tripped through the serialisation/deserialization 
process.

From the point of view of this diff - it is relevant to consider other 
implementations that achieve the same objective - we seem to have uncovered 
some additional questions to answer.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3848876 , @iains wrote:

> If we are going to **//require//** the serialisation/deserialisation for 
> correctness then IMO we should have an error for an incorrect compilation.

Of course.

> I still feel that deserialisation should do deserialisation, and Sema should 
> be doing the diagnoses when it tries to use a decl (in fact, I would expect 
> that to be required for correctness in the C++20 scheme, since the point of 
> use is significant).  I will defer to folks who know the history better - but 
> this seems like a layering bug to me still.

Another reason I forgot to mention is that the deserialization need to do 
merging. It is common in practice that a similar declarations lives in multiple 
module files (e.g., by GMF). Then if the deserialization needs to merge things, 
it is naturally that the deserilizer need to do ODR checks. But if you feel 
like the merging job belongs to Sema too, I think it might not be bad/wrong.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3848862 , @ChuanqiXu wrote:

> In D134267#3848793 , @iains wrote:
>
>> In D134267#3848745 , @ChuanqiXu 
>> wrote:
>>
 FAOD: I think this should not be about "who's patches" - but about the 
 compilation model and command lines that we want to end up with.
>>>
>>> BTW, in the current context, when I say "my" and "your", I just want to 
>>> refer the corresponding things. There is no means to offend.
>>
>> sure - no problem.
>>
>> I guess we ought to be a bit more clear:
>> There is implementation divergence between GCC and clang and from the point 
>> to view of users and build systems having two different command line sets is 
>> not helpful.
>>
>> The patches I drafted were aimed at removing that divergence.
>> If that is not possible (because the ODR analysis requires steps that make 
>> it very difficult or inefficient to do so) then some more thought and/or 
>> restructuring is needed.
>
> Yeah, now this is more clear.
>
>>> In D134267#3848672 , @iains wrote:
>>>
> (2) Your scheme saves the time for deserialization. However, clang 
> implement many ODR checks in the deserialization part. Skipping the 
> deserialization will omit these ODR checks, which will break the 
> semantics. And this is the reason why that scheme may not be good.

 What you seem to be saying here is that:

 `clang++ -std=c++20 foo.cxxm -c -o foo,o`

 edit;  maybe I should have written -xc++ foo.cxxm

 Does different ODR analysis from:

 `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
 `clang++ -std=c++20 foo.pcm -o foo.o`

 If so, then that is of some concern since those are both valid 
 compilations in the current compiler.

 In the first case, the back end is connected as an AST consumer to the 
 front - there is no serialise/deserialise.
>>>
>>> No. **Currently**, in clang, the following two is the same:
>>>
>>> `clang++ -std=c++20 foo.cxxm -c -o foo.o`
>>
>> That's why I added the edit ...
>>
>> `clang++ -std=c++20 -xc++ foo.cxxm -c -o foo.o`
>>
>> AFAIU that will not produce any error for the user?
>>
>> suppose I have foo.cxx which includes modules and has local declarations?
>> `clang++ -std=c++20  foo.cxx -c -o foo.o`
>>
>>> with
>>>
>>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>>
>>> The first compilation will generate `.pcm` files in the `/tmp` directories 
>>> and compile the `.pcm` files in the `/tmp` directories.
>>
>> yes, I should have added the -xc++ in the first case ;)
>
> Oh, I got it. Now it makes sense. I misses the edit : (.
>
> The answer is:
> (1) Now, the behavior is different. And I once sent a bug report for it.
> (2) **Now** there won't be direct error message. And I **was** planned to add 
> it. This is the reason why I closed the above bug report : )
> (3) If we're going to make it an error, this is not decided yet.

If we are going to **//require//** the serialisation/deserialisation for 
correctness then IMO we should have an error for an incorrect compilation.

>>> We could verify it by inserting debugging informations. Or writing a 
>>> ODR-mismatch example, then when the compiler emits an error, we'll find the 
>>> compiler emits the error in the deserialization part. I met these 2 
>>> situations for many times : )
>>>
 The model I implemented simply streams the AST instead of throwing it away 
 ...
>>>
>>> So that model misses the ODR checks, which is not good. Since it matters 
>>> with the semantics.
>>
>> Understood - I wonder if we have layering right here - surely Sema should be 
>> responsible for ODR checks?
>> Is there any reason that relevant hashes cannot be generated on the AST 
>> //**without**// streaming it?
>> Was this model intended from the start ?
>> (the fact that one can connect a direct AST consumer to the back end, 
>> suggests that enforcing serialisation/deserialisation was not expected to be 
>> a requirement at some stage)
>
> In fact, we do the ODR checks in Sema too. (Would you feel it amazing? I felt 
> amazing when I realize this at the first time)
>
> For the following example, the ODR check is made in Deserializer:
>
>   module;
>   #include "something"
>   export module any_module_name;
>   import m;
>
> or
>
>   // decls
>   import m;
>
> or
>
>   import m1;
>   import m2;
>
> In the above cases, the ODR check is made in deserialization part. And in the 
> following examples, the ODR check is made in Sema part:
>
>   import m;
>   // decls
>
> What is the principle? I think it is caused by the on-the-fly compilation 
> principle (The name is constructed by myself. I think you know what I am 
> saying.)
>
> I mean, when we (the compiler) parse a declaration and we want to know if it 
> violates the 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3848822 , @tschuett wrote:

> Clang header modules started with implicit builds and module caches. Apple is 
> moving too explicit builds. There are all kinds of problems with implicit 
> module builds. I believe that C++20 modules shouldn't make the same mistake.

Could you provide more details? Since we can't understand you if you only said 
it is bad without reasons then we can't be in the same page.

From my natural imagination, some kind of module cache is necessary. Otherwise, 
we may not be able to compile a hello world example in one command line like:

  clang++ -std=c++20 Hello.cppm main.cpp

And for a larger project, you can see the demo at: 
https://github.com/ChuanqiXu9/llvm-project/tree/MyP1689/P1689Examples. The 
cache is controlled by the build system instead of the compiler. So we are not 
moving the build system into the compiler.

My rough feeling is that you're telling us that something is bad in 100 steps 
aways. But we're in fact to move 1 step further. But let's be back in the same 
page first.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3848793 , @iains wrote:

> In D134267#3848745 , @ChuanqiXu 
> wrote:
>
>>> FAOD: I think this should not be about "who's patches" - but about the 
>>> compilation model and command lines that we want to end up with.
>>
>> BTW, in the current context, when I say "my" and "your", I just want to 
>> refer the corresponding things. There is no means to offend.
>
> sure - no problem.
>
> I guess we ought to be a bit more clear:
> There is implementation divergence between GCC and clang and from the point 
> to view of users and build systems having two different command line sets is 
> not helpful.
>
> The patches I drafted were aimed at removing that divergence.
> If that is not possible (because the ODR analysis requires steps that make it 
> very difficult or inefficient to do so) then some more thought and/or 
> restructuring is needed.

Yeah, now this is more clear.

>> In D134267#3848672 , @iains wrote:
>>
 (2) Your scheme saves the time for deserialization. However, clang 
 implement many ODR checks in the deserialization part. Skipping the 
 deserialization will omit these ODR checks, which will break the 
 semantics. And this is the reason why that scheme may not be good.
>>>
>>> What you seem to be saying here is that:
>>>
>>> `clang++ -std=c++20 foo.cxxm -c -o foo,o`
>>>
>>> edit;  maybe I should have written -xc++ foo.cxxm
>>>
>>> Does different ODR analysis from:
>>>
>>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>>
>>> If so, then that is of some concern since those are both valid compilations 
>>> in the current compiler.
>>>
>>> In the first case, the back end is connected as an AST consumer to the 
>>> front - there is no serialise/deserialise.
>>
>> No. **Currently**, in clang, the following two is the same:
>>
>> `clang++ -std=c++20 foo.cxxm -c -o foo.o`
>
> That's why I added the edit ...
>
> `clang++ -std=c++20 -xc++ foo.cxxm -c -o foo.o`
>
> AFAIU that will not produce any error for the user?
>
> suppose I have foo.cxx which includes modules and has local declarations?
> `clang++ -std=c++20  foo.cxx -c -o foo.o`
>
>> with
>>
>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>
>> The first compilation will generate `.pcm` files in the `/tmp` directories 
>> and compile the `.pcm` files in the `/tmp` directories.
>
> yes, I should have added the -xc++ in the first case ;)

Oh, I got it. Now it makes sense. I misses the edit : (.

The answer is:
(1) Now, the behavior is different. And I once sent a bug report for it.
(2) **Now** there won't be direct error message. And I **was** planned to add 
it. This is the reason why I closed the above bug report : )
(3) If we're going to make it an error, this is not decided yet.

>> We could verify it by inserting debugging informations. Or writing a 
>> ODR-mismatch example, then when the compiler emits an error, we'll find the 
>> compiler emits the error in the deserialization part. I met these 2 
>> situations for many times : )
>>
>>> The model I implemented simply streams the AST instead of throwing it away 
>>> ...
>>
>> So that model misses the ODR checks, which is not good. Since it matters 
>> with the semantics.
>
> Understood - I wonder if we have layering right here - surely Sema should be 
> responsible for ODR checks?
> Is there any reason that relevant hashes cannot be generated on the AST 
> //**without**// streaming it?
> Was this model intended from the start ?
> (the fact that one can connect a direct AST consumer to the back end, 
> suggests that enforcing serialisation/deserialisation was not expected to be 
> a requirement at some stage)

In fact, we do the ODR checks in Sema too. (Would you feel it amazing? I felt 
amazing when I realize this at the first time)

For the following example, the ODR check is made in Deserializer:

  module;
  #include "something"
  export module any_module_name;
  import m;

or

  // decls
  import m;

or

  import m1;
  import m2;

In the above cases, the ODR check is made in deserialization part. And in the 
following examples, the ODR check is made in Sema part:

  import m;
  // decls

What is the principle? I think it is caused by the on-the-fly compilation 
principle (The name is constructed by myself. I think you know what I am 
saying.)

I mean, when we (the compiler) parse a declaration and we want to know if it 
violates the ODR, we made it in the Sema part. And when we import something 
(precisely, read something from a module, since clang did lazy loading), we did 
the ODR check in deserialization.

Although I felt odd at first, but now I feel it reasonable in some levels. How 
do you feel now?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Thorsten via Phabricator via cfe-commits
tschuett added a comment.

In D134267#3848838 , @iains wrote:

> In D134267#3848822 , @tschuett 
> wrote:
>
>> Clang header modules started with implicit builds and module caches. Apple 
>> is moving too explicit builds. There are all kinds of problems with implicit 
>> module builds. I believe that C++20 modules shouldn't make the same mistake.
>
> I do not believe that the client-sever build system model (a.k.a. P1184 
>  / module mapper) is quite the same thing as 
> implicit modules - there is a partial similarity in that the dependencies are 
> discovered from the sources during compilation (rather than from a pre-scan).
>
> However it does not try to turn the compiler into a build system; that work 
> is done by the module-mapper (which can be some trivial in-process scheme or 
> a full-blown build system, e.g. build2 - which I believe has an 
> implementation that works with GCC).

But this diff is doing something else. It puts the build-system again into 
clang.

> (clearly we [the whole C++ community] need to figure out how we want these 
> things to look to the end user and try to harmonise the implementations)




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3848822 , @tschuett wrote:

> Clang header modules started with implicit builds and module caches. Apple is 
> moving too explicit builds. There are all kinds of problems with implicit 
> module builds. I believe that C++20 modules shouldn't make the same mistake.

I do not believe that the client-sever build system model (a.k.a. P1184 
 / module mapper) is quite the same thing as 
implicit modules - there is a partial similarity in that the dependencies are 
discovered from the sources during compilation (rather than from a pre-scan).

However it does not try to turn the compiler into a build system; that work is 
done by the module-mapper (which can be some trivial in-process scheme or a 
full-blown build system, e.g. build2 - which I believe has an implementation 
that works with GCC).

(clearly we [the whole C++ community] need to figure out how we want these 
things to look to the end user and try to harmonise the implementations)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Thorsten via Phabricator via cfe-commits
tschuett added a comment.

Clang header modules started with implicit builds and module caches. Apple is 
moving too explicit builds. There are all kinds of problems with implicit 
module builds. I believe that C++20 modules shouldn't make the same mistake.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3848745 , @ChuanqiXu wrote:

>> FAOD: I think this should not be about "who's patches" - but about the 
>> compilation model and command lines that we want to end up with.
>
> BTW, in the current context, when I say "my" and "your", I just want to refer 
> the corresponding things. There is no means to offend.

sure - no problem.

I guess we ought to be a bit more clear:
There is implementation divergence between GCC and clang and from the point to 
view of users and build systems having two different command line sets is not 
helpful.

The patches I drafted were aimed at removing that divergence.
If that is not possible (because the ODR analysis requires steps that make it 
very difficult or inefficient to do so) then some more thought and/or 
restructuring is needed.

> In D134267#3848672 , @iains wrote:
>
>>> (2) Your scheme saves the time for deserialization. However, clang 
>>> implement many ODR checks in the deserialization part. Skipping the 
>>> deserialization will omit these ODR checks, which will break the semantics. 
>>> And this is the reason why that scheme may not be good.
>>
>> What you seem to be saying here is that:
>>
>> `clang++ -std=c++20 foo.cxxm -c -o foo,o`
>>
>> edit;  maybe I should have written -xc++ foo.cxxm
>>
>> Does different ODR analysis from:
>>
>> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
>> `clang++ -std=c++20 foo.pcm -o foo.o`
>>
>> If so, then that is of some concern since those are both valid compilations 
>> in the current compiler.
>>
>> In the first case, the back end is connected as an AST consumer to the front 
>> - there is no serialise/deserialise.
>
> No. **Currently**, in clang, the following two is the same:
>
> `clang++ -std=c++20 foo.cxxm -c -o foo.o`

That's why I added the edit ...

`clang++ -std=c++20 -xc++ foo.cxxm -c -o foo.o`

AFAIU that will not produce any error for the user?

suppose I have foo.cxx which includes modules and has local declarations?
`clang++ -std=c++20  foo.cxx -c -o foo.o`

> with
>
> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
> `clang++ -std=c++20 foo.pcm -o foo.o`
>
> The first compilation will generate `.pcm` files in the `/tmp` directories 
> and compile the `.pcm` files in the `/tmp` directories.

yes, I should have added the -xc++ in the first case ;)

> We could verify it by inserting debugging informations. Or writing a 
> ODR-mismatch example, then when the compiler emits an error, we'll find the 
> compiler emits the error in the deserialization part. I met these 2 
> situations for many times : )
>
>> The model I implemented simply streams the AST instead of throwing it away 
>> ...
>
> So that model misses the ODR checks, which is not good. Since it matters with 
> the semantics.

Understood - I wonder if we have layering right here - surely Sema should be 
responsible for ODR checks?
Is there any reason that relevant hashes cannot be generated on the AST 
//**without**// streaming it?
Was this model intended from the start ?
(the fact that one can connect a direct AST consumer to the back end, suggests 
that enforcing serialisation/deserialisation was not expected to be a 
requirement at some stage)

>> (repeating here, it is not the deserialisation that I think is key [although 
>> that is also useful] but minimising process launches).
>
> So from my point of view, it is much  more important to preserve the 
> semantics than minimizing the process launches.
>
>> That is a possibility, at the expense of more process launches - and I do 
>> not think that the corresponding model for the server-interactive approach 
>> has yet been fully explored.
>>
>> We go to some trouble (with the driver invoking the compiler in-process) to 
>> avoid unnecessary process launches .. with the current state of hardware, it 
>> could well be more efficient to have many processes waiting rather than many 
>> processes launched.
>>
>> (I am not making a claim here, just observing that the data are incomplete, 
>> since we do not have a corresponding model offered for the client-server 
>> case, and we do not have any kind of measurements - these are all "thought 
>> experiments").
>
> Yeah, I agree that there more space to explore in the client-server model. 
> But for the current building model, the 2 phase compilation is obviously 
> better than the 1 phase compilation. And my point is, if one day we proved 
> that there are many good points in the client/server model, it is not too 
> late to add it in clang. And **currently** it is not a good reason to block 
> the current patch, at least it is pretty simple and innocent.
>
>> There seem to be two schools of thought. (1) It is good to use extensions to 
>> determine the file contents (2) we should not impose a requirement for magic 
>> extension names if the implementation can determine the content without.
>>
>> For my part, I do prefer the 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

> FAOD: I think this should not be about "who's patches" - but about the 
> compilation model and command lines that we want to end up with.

BTW, in the current context, when I say "my" and "your", I just want to refer 
the corresponding things. There is no means to offend.

In D134267#3848672 , @iains wrote:

>> (2) Your scheme saves the time for deserialization. However, clang implement 
>> many ODR checks in the deserialization part. Skipping the deserialization 
>> will omit these ODR checks, which will break the semantics. And this is the 
>> reason why that scheme may not be good.
>
> What you seem to be saying here is that:
>
> `clang++ -std=c++20 foo.cxxm -c -o foo,o`
>
> edit;  maybe I should have written -xc++ foo.cxxm
>
> Does different ODR analysis from:
>
> `clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
> `clang++ -std=c++20 foo.pcm -o foo.o`
>
> If so, then that is of some concern since those are both valid compilations 
> in the current compiler.
>
> In the first case, the back end is connected as an AST consumer to the front 
> - there is no serialise/deserialise.

No. **Currently**, in clang, the following two is the same:

`clang++ -std=c++20 foo.cxxm -c -o foo.o`

with

`clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
`clang++ -std=c++20 foo.pcm -o foo.o`

The first compilation will generate `.pcm` files in the `/tmp` directories and 
compile the `.pcm` files in the `/tmp` directories.

We could verify it by inserting debugging informations. Or writing a 
ODR-mismatch example, then when the compiler emits an error, we'll find the 
compiler emits the error in the deserialization part. I met these 2 situations 
for many times : )

> The model I implemented simply streams the AST instead of throwing it away ...

So that model misses the ODR checks, which is not good. Since it matters with 
the semantics.

> (repeating here, it is not the deserialisation that I think is key [although 
> that is also useful] but minimising process launches).

So from my point of view, it is much  more important to preserve the semantics 
than minimizing the process launches.

> That is a possibility, at the expense of more process launches - and I do not 
> think that the corresponding model for the server-interactive approach has 
> yet been fully explored.
>
> We go to some trouble (with the driver invoking the compiler in-process) to 
> avoid unnecessary process launches .. with the current state of hardware, it 
> could well be more efficient to have many processes waiting rather than many 
> processes launched.
>
> (I am not making a claim here, just observing that the data are incomplete, 
> since we do not have a corresponding model offered for the client-server 
> case, and we do not have any kind of measurements - these are all "thought 
> experiments").

Yeah, I agree that there more space to explore in the client-server model. But 
for the current building model, the 2 phase compilation is obviously better 
than the 1 phase compilation. And my point is, if one day we proved that there 
are many good points in the client/server model, it is not too late to add it 
in clang. And **currently** it is not a good reason to block the current patch, 
at least it is pretty simple and innocent.

> There seem to be two schools of thought. (1) It is good to use extensions to 
> determine the file contents (2) we should not impose a requirement for magic 
> extension names if the implementation can determine the content without.
>
> For my part, I do prefer the second .. but of course will be happy to go with 
> consensus,

I'm confused the second. I mean what if the other static analyzing tools 
(including build system but not limited to) want to analyze the modular 
project, if they can only invoke the compiler to query the result. Then things 
look complex enough in the state. But assume we can implement great compiler 
interfaces. There are still problems. The tools need to pass the potential 
module unit files to the compiler. With the first choice, the tools can pass 
the module unit files precisely while with the second one, the input number 
will be much larger. So it looks really worse to me...



Since we discussed many things. Let me try to summarize to avoid missing the 
topic.

(1) For the client-server model, I think it is too far now. I am not against 
it. Also I don't feel the current simple patch would is not good for it. 
(2) Your implementation skips the deserialization part, which misses many ODR 
checks. I think this is not good.
(3) I suddenly realize that your **design** and my **design** are not 
overlapped in some perspective. I mean the patch is working for `*.cppm` files, 
and your design is intended for `*.cpp` files. Then a question is: "what would 
the patch do for `*.cppm` files"? If we don't handle it specially, in your 
implementation, when we compile a `.cppm` file, the `.pcm` file would be 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-11 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3848539 , @ChuanqiXu wrote:

> In D134267#3847399 , @iains wrote:
>
>> 



>> I am concerned that with the current scheme (with or without the change in 
>> this review) we could end up with three process launches per modular file:
>>
>> - one to determine the deps
>> - one to produce the BMI
>> - one to produce the object
>>
>> At least with GCC (and a patch series like the one I drafted) we can reduce 
>> that to 2
>>
>> With the interfaces mentioned in P1184 , it 
>> is plausible to make that 1 (at the expense of potentially many stalled 
>> builds - but stalled early in compilation, so hopefully with the smallest 
>> memory footprint we could achieve).
>
> I pretty believe the current scheme is better for the following reasons:
> (1) The current scheme has higher paralellism potentially. See the `Why 
> two-phase compilation?` section in https://reviews.llvm.org/D134269 for 
> examples. I believe this should be the future direction to speedup the 
> compilation.

That is a possibility, at the expense of more process launches - and I do not 
think that the corresponding model for the server-interactive approach has yet 
been fully explored.

We go to some trouble (with the driver invoking the compiler in-process) to 
avoid unnecessary process launches .. with the current state of hardware, it 
could well be more efficient to have many processes waiting rather than many 
processes launched.

(I am not making a claim here, just observing that the data are incomplete, 
since we do not have a corresponding model offered for the client-server case, 
and we do not have any kind of measurements - these are all "thought 
experiments").

> (2) Your scheme saves the time for deserialization. However, clang implement 
> many ODR checks in the deserialization part. Skipping the deserialization 
> will omit these ODR checks, which will break the semantics. And this is the 
> reason why that scheme may not be good.

What you seem to be saying here is that:

`clang++ -std=c++20 foo.cxxm -c -o foo,o`

Does different ODR analysis from:

`clang++ -std=c++20 foo.cxxm --precompile -o foo.pcm`
`clang++ -std=c++20 foo.pcm -o foo.o`

If so, then that is of some concern since those are both valid compilations in 
the current compiler.

In the first case, the back end is connected as an AST consumer to the front - 
there is no serialise/deserialise.  The model I implemented simply streams the 
AST instead of throwing it away ...

(repeating here, it is not the deserialisation that I think is key [although 
that is also useful] but minimising process launches).

> (3) It makes implementation more complex.

Agreed, but it's reasonably self-contained.

> For the first 2 reasons, I believe clang is in the right directions. Also for 
> the filename suffixes, it is helpful for static analyzing tools to know 
> whether or not if a file is module unit. So the static analyzing tools can 
> scan every `.cppm` file in the working directory automatically in the 
> preparation time. For example, if the build system are ready enough, we can 
> omit the `module files` in the build script like: 
> https://github.com/ChuanqiXu9/llvm-project/blob/94a71a7046a8672a7e06311b0c05c0a8c9ae4619/P1689Examples/HelloWorld/CMakeLists.txt#L18-L19.
> I believe this is the reason why MSVC made a similar choice. (clang is easy 
> to support MSVC's 'ixx' extension).

There seem to be two schools of thought.  (1) It is good to use extensions to 
determine the file contents (2) we should not impose a requirement for magic 
extension names if the implementation can determine the content without.

For my part, I do prefer the second .. but of course will be happy to go with 
consensus,


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3847399 , @iains wrote:

> FAOD: I think this should not be about "who's patches" - but about the 
> compilation model and command lines that we want to end up with.

Yeah, of course.

> I am concerned that with the current scheme (with or without the change in 
> this review) we could end up with three process launches per modular file:
>
> - one to determine the deps
> - one to produce the BMI
> - one to produce the object
>
> At least with GCC (and a patch series like the one I drafted) we can reduce 
> that to 2
>
> With the interfaces mentioned in P1184 , it 
> is plausible to make that 1 (at the expense of potentially many stalled 
> builds - but stalled early in compilation, so hopefully with the smallest 
> memory footprint we could achieve).

I pretty believe the current scheme is better for the following reasons:
(1) The current scheme has higher paralellism potentially. See the `Why 
two-phase compilation?` section in https://reviews.llvm.org/D134269 for 
examples. I believe this should be the future direction to speedup the 
compilation.
(2) Your scheme saves the time for deserialization. However, clang implement 
many ODR checks in the deserialization part. Skipping the deserialization will 
omit these ODR checks, which will break the semantics. And this is the reason 
why that scheme may not be good.
(3) It makes implementation more complex.

For the first 2 reasons, I believe clang is in the right directions. Also for 
the filename suffixes, it is helpful for static analyzing tools to know whether 
or not if a file is module unit. So the static analyzing tools can scan every 
`.cppm` file in the working directory automatically in the preparation time. 
For example, if the build system are ready enough, we can omit the `module 
files` in the build script like: 
https://github.com/ChuanqiXu9/llvm-project/blob/94a71a7046a8672a7e06311b0c05c0a8c9ae4619/P1689Examples/HelloWorld/CMakeLists.txt#L18-L19.
I believe this is the reason why MSVC made a similar choice. (clang is easy to 
support MSVC's 'ixx' extension).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

FAOD: I think this should not be about "who's patches" - but about the 
compilation model and command lines that we want to end up with.

I am concerned that with the current scheme (with or without the change in this 
review) we could end up with three process launches per modular file:

- one to determine the deps
- one to produce the BMI
- one to produce the object

At least with GCC (and a patch series like the one I drafted) we can reduce 
that to 2

With the interfaces mentioned in P1184 , it is 
plausible to make that 1 (at the expense of potentially many stalled builds - 
but stalled early in compilation, so hopefully with the smallest memory 
footprint we could achieve).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3846264 , @iains wrote:

> In D134267#3846218 , @ChuanqiXu 
> wrote:
>
>> In D134267#3846153 , @iains wrote:
>>
>>> @ChuanqiXu BTW, I think we (unfortunately) have some overlap in work in 
>>> progress here;
>>>
>>> I have implemented a similar scheme to GCC's where C/BMIs can be produced 
>>> "on demand" when the FE encounters the module keywords.  The idea was to 
>>> make it possible to have the same command lines on both compilers 
>>> (basically to remove the need for the user to identify that a c++ source is 
>>> is modular - with the sole exception of header units which the compiler 
>>> cannot figure out on its own).
>>>
>>> This scheme interfaces with the module-mapper approach (but how the mapping 
>>> is determined could just as easily be from command line flags, module map 
>>> files etc. etc. that is the beauty of that approach - the compiler has no 
>>> need to know about how the mapping is made - it can be a fixed in-process 
>>> scheme (like using the module cache path + some well-known mapping of 
>>> names) or a full-blown build system - e.g. attaching to build2 (I think 
>>> Boris did try out an earlier version of my implementation.)
>>>
>>> draft patches (need rebasing) here: D118896 
>>> , D118898 
>>> ,D118895 
>>> , D118899 
>>>  (these do not include the libCody stuff, 
>>> which would support the module mapper, but concentrate on the compiler 
>>> side).
>>> edit: D118895  is probably no longer 
>>> relevant.



In D134267#3846334 , @ChuanqiXu wrote:

> In D134267#3846264 , @iains wrote:
>
>> In D134267#3846218 , @ChuanqiXu 
>> wrote:
>>
>>> In D134267#3846153 , @iains wrote:
>>>
 @ChuanqiXu BTW, I think we (unfortunately) have some overlap in work in 
 progress here;

 I have implemented a similar scheme to GCC's where C/BMIs can be produced 
 "on demand" when the FE encounters the module keywords.  The idea was to 
 make it possible to have the same command lines on both compilers 
 (basically to remove the need for the user to identify that a c++ source 
 is is modular - with the sole exception of header units which the compiler 
 cannot figure out on its own).

 This scheme interfaces with the module-mapper approach (but how the 
 mapping is determined could just as easily be from command line flags, 
 module map files etc. etc. that is the beauty of that approach - the 
 compiler has no need to know about how the mapping is made - it can be a 
 fixed in-process scheme (like using the module cache path + some 
 well-known mapping of names) or a full-blown build system - e.g. attaching 
 to build2 (I think Boris did try out an earlier version of my 
 implementation.)

 draft patches (need rebasing) here: D118896 
 , D118898 
 ,D118895 
 , D118899 
  (these do not include the libCody 
 stuff, which would support the module mapper, but concentrate on the 
 compiler side).
 edit: D118895  is probably no longer 
 relevant.

 Reworking/rebasing this scheme and posting the patches is next on my TODO 
 after (current work on) P1815  part 2.

 Hopefully, there is some way to combine these pieces of work to give the 
 user greatest flexibility.
>>>
>>> @iains oh, I remembered just now that you said you'd like to take this in 
>>> one of GitHub issues long ago. My bad. I forgot it when I started this. I 
>>> think you can add me as subscriber such drafts next time to avoid similar 
>>> things happen again.
>>> In fact, I had some drafts about modules and they conflicted with yours 
>>> previously. So I'm sure I understand this : (
>>>
>>> And for this concrete job (generate pcm and object file in one compilation 
>>> job), I think my implementation looks better. It is shorter and reuses the 
>>> current behavior (generate `.pcm` files in `/tmp` directories and we just 
>>> redirect it). So I think it may be better to pursue this patch for this 
>>> task. (The option design could be changed of course).
>>
>> I need to review this patch some more - but at first looking it seems to be 
>> driver-only - so I am not sure how you are arranging for the PCM  **and** 
>> the object to  be created from a single invocation of the FE.  The idea of 
>> my patch series is to match 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3846264 , @iains wrote:

> In D134267#3846218 , @ChuanqiXu 
> wrote:
>
>> In D134267#3846153 , @iains wrote:
>>
>>> @ChuanqiXu BTW, I think we (unfortunately) have some overlap in work in 
>>> progress here;
>>>
>>> I have implemented a similar scheme to GCC's where C/BMIs can be produced 
>>> "on demand" when the FE encounters the module keywords.  The idea was to 
>>> make it possible to have the same command lines on both compilers 
>>> (basically to remove the need for the user to identify that a c++ source is 
>>> is modular - with the sole exception of header units which the compiler 
>>> cannot figure out on its own).
>>>
>>> This scheme interfaces with the module-mapper approach (but how the mapping 
>>> is determined could just as easily be from command line flags, module map 
>>> files etc. etc. that is the beauty of that approach - the compiler has no 
>>> need to know about how the mapping is made - it can be a fixed in-process 
>>> scheme (like using the module cache path + some well-known mapping of 
>>> names) or a full-blown build system - e.g. attaching to build2 (I think 
>>> Boris did try out an earlier version of my implementation.)
>>>
>>> draft patches (need rebasing) here: D118896 
>>> , D118898 
>>> ,D118895 
>>> , D118899 
>>>  (these do not include the libCody stuff, 
>>> which would support the module mapper, but concentrate on the compiler 
>>> side).
>>> edit: D118895  is probably no longer 
>>> relevant.
>>>
>>> Reworking/rebasing this scheme and posting the patches is next on my TODO 
>>> after (current work on) P1815  part 2.
>>>
>>> Hopefully, there is some way to combine these pieces of work to give the 
>>> user greatest flexibility.
>>
>> @iains oh, I remembered just now that you said you'd like to take this in 
>> one of GitHub issues long ago. My bad. I forgot it when I started this. I 
>> think you can add me as subscriber such drafts next time to avoid similar 
>> things happen again.
>> In fact, I had some drafts about modules and they conflicted with yours 
>> previously. So I'm sure I understand this : (
>>
>> And for this concrete job (generate pcm and object file in one compilation 
>> job), I think my implementation looks better. It is shorter and reuses the 
>> current behavior (generate `.pcm` files in `/tmp` directories and we just 
>> redirect it). So I think it may be better to pursue this patch for this 
>> task. (The option design could be changed of course).
>
> I need to review this patch some more - but at first looking it seems to be 
> driver-only - so I am not sure how you are arranging for the PCM  **and** the 
> object to  be created from a single invocation of the FE.  The idea of my 
> patch series is to match the behavior o GCC - where the FE can figure out 
> (independently of any file suffix) that it needs to emit an BMI.

Let me add some more words to ease the understand. The secret here is that 
clang knows it is compiling a module unit before parsing - by the suffix 
`.cppm` or specified `-xc++-module` explicitly. And the clang would always 
generate the pcm file and the object file for the module unit in the past time 
too. The original behavior to compile a `*.cppm` to `*.o` would be:

1. Compile `*.cppm` to `/tmp/random-name.pcm`
2. Compile `/tmp/random-name.pcm` to `*.o`

And the only change this patch made is to change the destination of 
`/tmp/random-name.pcm`.

BTW, the meaning of `--precompile` (or `-emit-module-interface` in Xclang) is 
to stop at the precompilation step instead of doing precompilation. Since 
precompilation would always happen for module unit.

>> ---
>>
>> (following off may not be related to this patch)
>
> perhaps - but a small response here (we can take this to a different forum if 
> needed)
>
>> And it is definitely valuable to have command line interfaces compatibility 
>> between clang and gcc. But for the libcody things, we must need a RFC in the 
>> discourse first.
>
> Well the idea had also been discussed among the 'modules group' but. sure we 
> can this - however the module-mapper is not required to make this initial 
> patch series useful on its own - it is about automatically producing the BMI 
> without special command lines.
>
>> Also for the module mapper things, I'm not sure if we should pursue it. (I'm 
>> not against it. I'm just hesitating). For example, I am trying to implement 
>> the proposal P1689  from kitware guys in 
>> https://github.com/ChuanqiXu9/llvm-project/tree/MyP1689/P1689Examples/HelloWorld.
>>  And it looks good that we can compile a hello world example now. So I'm not 
>> sure if 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

In D134267#3846218 , @ChuanqiXu wrote:

> In D134267#3846153 , @iains wrote:
>
>> @ChuanqiXu BTW, I think we (unfortunately) have some overlap in work in 
>> progress here;
>>
>> I have implemented a similar scheme to GCC's where C/BMIs can be produced 
>> "on demand" when the FE encounters the module keywords.  The idea was to 
>> make it possible to have the same command lines on both compilers (basically 
>> to remove the need for the user to identify that a c++ source is is modular 
>> - with the sole exception of header units which the compiler cannot figure 
>> out on its own).
>>
>> This scheme interfaces with the module-mapper approach (but how the mapping 
>> is determined could just as easily be from command line flags, module map 
>> files etc. etc. that is the beauty of that approach - the compiler has no 
>> need to know about how the mapping is made - it can be a fixed in-process 
>> scheme (like using the module cache path + some well-known mapping of names) 
>> or a full-blown build system - e.g. attaching to build2 (I think Boris did 
>> try out an earlier version of my implementation.)
>>
>> draft patches (need rebasing) here: D118896 
>> , D118898 
>> ,D118895 
>> , D118899 
>>  (these do not include the libCody stuff, 
>> which would support the module mapper, but concentrate on the compiler side).
>> edit: D118895  is probably no longer 
>> relevant.
>>
>> Reworking/rebasing this scheme and posting the patches is next on my TODO 
>> after (current work on) P1815  part 2.
>>
>> Hopefully, there is some way to combine these pieces of work to give the 
>> user greatest flexibility.
>
> @iains oh, I remembered just now that you said you'd like to take this in one 
> of GitHub issues long ago. My bad. I forgot it when I started this. I think 
> you can add me as subscriber such drafts next time to avoid similar things 
> happen again.
> In fact, I had some drafts about modules and they conflicted with yours 
> previously. So I'm sure I understand this : (
>
> And for this concrete job (generate pcm and object file in one compilation 
> job), I think my implementation looks better. It is shorter and reuses the 
> current behavior (generate `.pcm` files in `/tmp` directories and we just 
> redirect it). So I think it may be better to pursue this patch for this task. 
> (The option design could be changed of course).

I need to review this patch some more - but at first looking it seems to be 
driver-only - so I am not sure how you are arranging for the PCM  **and** the 
object to  be created from a single invocation of the FE.  The idea of my patch 
series is to match the behavior o GCC - where the FE can figure out 
(independently of any file suffix) that it needs to emit an BMI.

> ---
>
> (following off may not be related to this patch)

perhaps - but a small response here (we can take this to a different forum if 
needed)

> And it is definitely valuable to have command line interfaces compatibility 
> between clang and gcc. But for the libcody things, we must need a RFC in the 
> discourse first.

Well the idea had also been discussed among the 'modules group' but. sure we 
can this - however the module-mapper is not required to make this initial patch 
series useful on its own - it is about automatically producing the BMI without 
special command lines.

> Also for the module mapper things, I'm not sure if we should pursue it. (I'm 
> not against it. I'm just hesitating). For example, I am trying to implement 
> the proposal P1689  from kitware guys in 
> https://github.com/ChuanqiXu9/llvm-project/tree/MyP1689/P1689Examples/HelloWorld.
>  And it looks good that we can compile a hello world example now. So I'm not 
> sure if the gcc's module mapper is necessary here. Maybe we need more 
> discussing.

There is almost certainly more than one solution to this and different 
solutions will fit different scenarios - Cmake and build2 are examples, but 
actually it's quite nice with a small project to be able to use an in-process 
resolver for names and not require any external build system.  I realise a lot 
of discussion in SG15 is targeted at the larger scale problems, but we should 
also provide (as you suggest) for "hello world" - preferably without the user 
having to install a separate package to make it work.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3846153 , @iains wrote:

> @ChuanqiXu BTW, I think we (unfortunately) have some overlap in work in 
> progress here;
>
> I have implemented a similar scheme to GCC's where C/BMIs can be produced "on 
> demand" when the FE encounters the module keywords.  The idea was to make it 
> possible to have the same command lines on both compilers (basically to 
> remove the need for the user to identify that a c++ source is is modular - 
> with the sole exception of header units which the compiler cannot figure out 
> on its own).
>
> This scheme interfaces with the module-mapper approach (but how the mapping 
> is determined could just as easily be from command line flags, module map 
> files etc. etc. that is the beauty of that approach - the compiler has no 
> need to know about how the mapping is made - it can be a fixed in-process 
> scheme (like using the module cache path + some well-known mapping of names) 
> or a full-blown build system - e.g. attaching to build2 (I think Boris did 
> try out an earlier version of my implementation.)
>
> draft patches (need rebasing) here: D118896 
> , D118898 
> ,D118895 
> , D118899 
>  (these do not include the libCody stuff, 
> which would support the module mapper, but concentrate on the compiler side).
> edit: D118895  is probably no longer 
> relevant.
>
> Reworking/rebasing this scheme and posting the patches is next on my TODO 
> after (current work on) P1815  part 2.
>
> Hopefully, there is some way to combine these pieces of work to give the user 
> greatest flexibility.

@iains oh, I remembered just now that you said you'd like to take this in one 
of GitHub issues long ago. My bad. I forgot it when I started this. I think you 
can add me as subscriber such drafts next time to avoid similar things happen 
again.
In fact, I had some drafts about modules and they conflicted with yours 
previously. So I'm sure I understand this : (

And for this concrete job (generate pcm and object file in one compilation 
job), I think my implementation looks better. It is shorter and reuses the 
current behavior (generate `.pcm` files in `/tmp` directories and we just 
redirect it). So I think it may be better to pursue this patch for this task. 
(The option design could be changed of course).

---

(following off may not be related to this patch)
And it is definitely valuable to have command line interfaces compatibility 
between clang and gcc. But for the libcody things, we must need a RFC in the 
discourse first. Also for the module mapper things, I'm not sure if we should 
pursue it. (I'm not against it. I'm just hesitating). For example, I am trying 
to implement the proposal P1689  from kitware 
guys in 
https://github.com/ChuanqiXu9/llvm-project/tree/MyP1689/P1689Examples/HelloWorld.
 And it looks good that we can compile a hello world example now. So I'm not 
sure if the gcc's module mapper is necessary here. Maybe we need more 
discussing.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-10 Thread Iain Sandoe via Phabricator via cfe-commits
iains added a comment.

@ChuanqiXu BTW, I think we (unfortunately) have some overlap in work in 
progress here;

I have implemented a similar scheme to GCC's where C/BMIs can be produced "on 
demand" when the FE encounters the module keywords.  The idea was to make it 
possible to have the same command lines on both compilers (basically to remove 
the need for the user to identify that a c++ source is is modular - with the 
sole exception of header units which the compiler cannot figure out on its own).

This scheme interfaces with the module-mapper approach (but how the mapping is 
determined could just as easily be from command line flags, module map files 
etc. etc. that is the beauty of that approach - the compiler has no need to 
know about how the mapping is made - it can be a fixed in-process scheme (like 
using the module cache path + some well-known mapping of names) or a full-blown 
build system - e.g. attaching to build2 (I think Boris did try out an earlier 
version of my implementation.)

draft patches (need rebasing) here: D118896 , 
D118898 ,D118895 
, D118899  
(these do not include the libCody stuff, which would support the module mapper, 
but concentrate on the compiler side).

Reworking/rebasing this scheme and posting the patches is next on my TODO after 
(current work on) P1815  part 2.

Hopefully, there is some way to combine these pieces of work to give the user 
greatest flexibility.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-08 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a comment.

In D134267#3844605 , @ChuanqiXu wrote:

> In D134267#3815453 , @dblaikie 
> wrote:
>
>>> This also tries to fix the problem I raised a year ago: 
>>> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144
>>
>> I think this thread is fairly different from what this patch is proposing.
>>
>> @rsmith for discussion of adding effectively implicit modules support for 
>> C++20 modules - this seems like a pretty significant design choice, but 
>> something we (you and I, SG15, etc) have discussed before in terms of "how 
>> do we support 'Hello, World.'" situations without requiring something that 
>> amounts to a build system even for the simplest C++20 modules programs. 
>> Maybe this is it? But it surprises me a bit, and I don't think the linked 
>> thread is sufficiently relevant to draw conclusions about this direction.
>
> I don't understand your point a lot. From my point, the intention of the 
> thread 
> (https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144)
>  are asking why the command line interfaces of clang are clearly more complex 
> than gcc. And @rsmith replies that:
>
>> Currently, Clang lacks two important features here:
>>
>> 1. Produce a .pcm file and a .o file from a single compilation action.
>
> in 
> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144/12.
>  And I think this is what this patch tries to do: (generate .pcm file and .o 
> file in one single compilation action).
>
> In fact, this doesn't change the behavior a lot. Previously, when we tried to 
> compile a `*.cppm` directly to `*.o`, the compiler will generate `*.pcm` 
> automatically too! Although the `*.pcm` file is only generated in `/tmp/` 
> files by default. And what this patch did is only to change the default 
> location for `*.pcm` files from `/tmp` to `module cache path`. And it 
> wouldn't affect the original 2 phase compilation model.



In D134267#3845177 , @dblaikie wrote:

> In D134267#3844605 , @ChuanqiXu 
> wrote:
>
>> In D134267#3815453 , @dblaikie 
>> wrote:
>>
 This also tries to fix the problem I raised a year ago: 
 https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144
>>>
>>> I think this thread is fairly different from what this patch is proposing.
>>>
>>> @rsmith for discussion of adding effectively implicit modules support for 
>>> C++20 modules - this seems like a pretty significant design choice, but 
>>> something we (you and I, SG15, etc) have discussed before in terms of "how 
>>> do we support 'Hello, World.'" situations without requiring something that 
>>> amounts to a build system even for the simplest C++20 modules programs. 
>>> Maybe this is it? But it surprises me a bit, and I don't think the linked 
>>> thread is sufficiently relevant to draw conclusions about this direction.
>>
>> I don't understand your point a lot. From my point, the intention of the 
>> thread 
>> (https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144)
>>  are asking why the command line interfaces of clang are clearly more 
>> complex than gcc. And @rsmith replies that:
>>
>>> Currently, Clang lacks two important features here:
>>>
>>> 1. Produce a .pcm file and a .o file from a single compilation action.
>>
>> in 
>> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144/12.
>>  And I think this is what this patch tries to do: (generate .pcm file and .o 
>> file in one single compilation action).
>>
>> In fact, this doesn't change the behavior a lot. Previously, when we tried 
>> to compile a `*.cppm` directly to `*.o`, the compiler will generate `*.pcm` 
>> automatically too! Although the `*.pcm` file is only generated in `/tmp/` 
>> files by default. And what this patch did is only to change the default 
>> location for `*.pcm` files from `/tmp` to `module cache path`. And it 
>> wouldn't affect the original 2 phase compilation model.
>>
>> BTW, this can simplify the use of modules significantly, see 
>> https://reviews.llvm.org/D135507#change-gpsoCkM1g61J for example.
>
> The use of a module cache path I think is a pretty major difference between 
> what was discussed on the discourse thread and what's being proposed here - a 
> module "cache" is what starts to touch near Clang's old implicit modules that 
> has real problems in terms of parallelism, etc. (granted what you're 
> proposing here doesn't go all the way towards that - it doesn't have implicit 
> actions to generate the pcm, at least)
>
> Is this what GCC Supports? I'd expect something more like Split DWARF, where 
> the .pcm (or .dwo file) is placed next to the .o file 

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-08 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a comment.

In D134267#3844605 , @ChuanqiXu wrote:

> In D134267#3815453 , @dblaikie 
> wrote:
>
>>> This also tries to fix the problem I raised a year ago: 
>>> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144
>>
>> I think this thread is fairly different from what this patch is proposing.
>>
>> @rsmith for discussion of adding effectively implicit modules support for 
>> C++20 modules - this seems like a pretty significant design choice, but 
>> something we (you and I, SG15, etc) have discussed before in terms of "how 
>> do we support 'Hello, World.'" situations without requiring something that 
>> amounts to a build system even for the simplest C++20 modules programs. 
>> Maybe this is it? But it surprises me a bit, and I don't think the linked 
>> thread is sufficiently relevant to draw conclusions about this direction.
>
> I don't understand your point a lot. From my point, the intention of the 
> thread 
> (https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144)
>  are asking why the command line interfaces of clang are clearly more complex 
> than gcc. And @rsmith replies that:
>
>> Currently, Clang lacks two important features here:
>>
>> 1. Produce a .pcm file and a .o file from a single compilation action.
>
> in 
> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144/12.
>  And I think this is what this patch tries to do: (generate .pcm file and .o 
> file in one single compilation action).
>
> In fact, this doesn't change the behavior a lot. Previously, when we tried to 
> compile a `*.cppm` directly to `*.o`, the compiler will generate `*.pcm` 
> automatically too! Although the `*.pcm` file is only generated in `/tmp/` 
> files by default. And what this patch did is only to change the default 
> location for `*.pcm` files from `/tmp` to `module cache path`. And it 
> wouldn't affect the original 2 phase compilation model.
>
> BTW, this can simplify the use of modules significantly, see 
> https://reviews.llvm.org/D135507#change-gpsoCkM1g61J for example.

The use of a module cache path I think is a pretty major difference between 
what was discussed on the discourse thread and what's being proposed here - a 
module "cache" is what starts to touch near Clang's old implicit modules that 
has real problems in terms of parallelism, etc. (granted what you're proposing 
here doesn't go all the way towards that - it doesn't have implicit actions to 
generate the pcm, at least)

Is this what GCC Supports? I'd expect something more like Split DWARF, where 
the .pcm (or .dwo file) is placed next to the .o file - not in some separate 
directory (what sort of naming scheme would be used to ensure that paths in 
that directory are unique?).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-10-07 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu added a reviewer: rsmith.
ChuanqiXu added a comment.

In D134267#3815453 , @dblaikie wrote:

>> This also tries to fix the problem I raised a year ago: 
>> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144
>
> I think this thread is fairly different from what this patch is proposing.
>
> @rsmith for discussion of adding effectively implicit modules support for 
> C++20 modules - this seems like a pretty significant design choice, but 
> something we (you and I, SG15, etc) have discussed before in terms of "how do 
> we support 'Hello, World.'" situations without requiring something that 
> amounts to a build system even for the simplest C++20 modules programs. Maybe 
> this is it? But it surprises me a bit, and I don't think the linked thread is 
> sufficiently relevant to draw conclusions about this direction.

I don't understand your point a lot. From my point, the intention of the thread 
(https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144)
 are asking why the command line interfaces of clang are clearly more complex 
than gcc. And @rsmith replies that:

> Currently, Clang lacks two important features here:
>
> 1. Produce a .pcm file and a .o file from a single compilation action.

in 
https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144/12.
 And I think this is what this patch tries to do: (generate .pcm file and .o 
file in one single compilation action).

In fact, this doesn't change the behavior a lot. Previously, when we tried to 
compile a `*.cppm` directly to `*.o`, the compiler will generate `*.pcm` 
automatically too! Although the `*.pcm` file is only generated in `/tmp/` files 
by default. And what this patch did is only to change the default location for 
`*.pcm` files from `/tmp` to `module cache path`. And it wouldn't affect the 
original 2 phase compilation model.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-09-26 Thread David Blaikie via Phabricator via cfe-commits
dblaikie added a subscriber: rsmith.
dblaikie added a comment.

> This also tries to fix the problem I raised a year ago: 
> https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144

I think this thread is fairly different from what this patch is proposing.

@rsmith for discussion of adding effectively implicit modules support for C++20 
modules - this seems like a pretty significant design choice, but something we 
(you and I, SG15, etc) have discussed before in terms of "how do we support 
'Hello, World.'" situations without requiring something that amounts to a build 
system even for the simplest C++20 modules programs. Maybe this is it? But it 
surprises me a bit, and I don't think the linked thread is sufficiently 
relevant to draw conclusions about this direction.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-09-25 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu updated this revision to Diff 462700.
ChuanqiXu added a comment.

- Add ReleaseNotes.
- Refine the behavior about partitions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134267/new/

https://reviews.llvm.org/D134267

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticCommonKinds.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/create_module_cache.cpp
  clang/test/Modules/one-phase-compilation-named-modules.cppm

Index: clang/test/Modules/one-phase-compilation-named-modules.cppm
===
--- /dev/null
+++ clang/test/Modules/one-phase-compilation-named-modules.cppm
@@ -0,0 +1,29 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: %clang -std=c++20 %t/M.cppm -c -o - -fmodules-cache-path=%t/pcm.cache
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+//
+// RUN: rm -f %t/M.pcm
+// Tests that the Sema will detect the incorrect module name.
+// RUN: %clang -std=c++20 %t/MismatchedName.cppm -o - -fmodules-cache-path=%t/pcm.cache \
+// RUN: -fmodule-name=WRONG_MODULE_NAME -fsyntax-only -Xclang -verify
+//
+// RUN: %clang -std=c++20 %t/MismatchedName.cppm -c -o - -fmodules-cache-path=%t/pcm.cache -fmodule-name=M
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+
+//--- M.cppm
+export module M;
+export int getValue();
+
+//--- MismatchedName.cppm
+export module M;   // expected-error {{module name 'WRONG_MODULE_NAME' specified on command line does not match name of module}}
+export int getValue(); // expected-error {{export declaration can only be used within a module interface unit after the module declaration}}
+
+//--- Use.cpp
+// expected-no-diagnostics
+import M;
+int Use() {
+return getValue();
+}
Index: clang/test/Driver/create_module_cache.cpp
===
--- /dev/null
+++ clang/test/Driver/create_module_cache.cpp
@@ -0,0 +1,28 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: not %clang -std=c++20 %t/M.cppm -fmodules-cache-path= -c -o %t/M.tmp.o 2>&1 | FileCheck %t/M.cppm --check-prefix=ERROR
+// RUN: %clang -std=c++20 %t/M.cppm -fmodules-cache-path=%t/abc -c -o -
+// RUN: ls %t | FileCheck %t/M.cppm --check-prefix=CHECK-AVAILABLE
+//
+// RUN: %clang -std=c++20 %t/Use.cpp -fmodules-cache-path=abc -### 2>&1 | FileCheck %t/Use.cpp
+//
+// Check that the compiler will generate M-Part.pcm correctly.
+// RUN: %clang -std=c++20 %t/Part.cppm -fmodules-cache-path=%t/abc -fmodule-name=M:Part -c -o -
+// RUN: ls %t/abc | FileCheck %t/Part.cppm --check-prefix=CHECK-AVAILABLE
+
+//--- M.cppm
+export module M;
+
+// ERROR: unable to create default module cache path "": No such file or directory
+// CHECK-AVAILABLE: abc
+
+//--- Use.cpp
+import M;
+
+// CHECK: -fprebuilt-module-path=abc
+
+//--- Part.cppm
+export module M:Part;
+// CHECK-AVAILABLE: M-Part.pcm
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -3658,6 +3658,7 @@
  const ArgList , const InputInfo ,
  const InputInfo ,
  ArgStringList , bool ) {
+  bool HasCXXModules = HaveModules;
   // -fmodules enables the use of precompiled modules (off by default).
   // Users can pass -fno-cxx-modules to turn off modules support for
   // C++/Objective-C++ programs.
@@ -3699,33 +3700,34 @@
 options::OPT_fno_implicit_modules, HaveClangModules)) {
 if (HaveModules)
   CmdArgs.push_back("-fno-implicit-modules");
-  } else if (HaveModules) {
+  } else if (HaveModules)
 ImplicitModules = true;
-// -fmodule-cache-path specifies where our implicitly-built module files
-// should be written.
-SmallString<128> Path;
-if (Arg *A = Args.getLastArg(options::OPT_fmodules_cache_path))
-  Path = A->getValue();
 
-bool HasPath = true;
-if (C.isForDiagnostics()) {
-  // When generating crash reports, we want to emit the modules along with
-  // the reproduction sources, so we ignore any provided module path.
-  Path = Output.getFilename();
-  llvm::sys::path::replace_extension(Path, ".cache");
-  llvm::sys::path::append(Path, "modules");
-} else if (Path.empty()) {
-  // No module path was provided: use the default.
-  HasPath = Driver::getDefaultModuleCachePath(Path);
-}
-
-// `HasPath` will only be false if getDefaultModuleCachePath() fails.
-// That being said, that failure is unlikely and not caching is harmless.
-if (HasPath) {
-  const char Arg[] = "-fmodules-cache-path=";
-  Path.insert(Path.begin(), Arg, Arg + strlen(Arg));
-  

[PATCH] D134267: [C++] [Modules] Support one phase compilation model for named modules

2022-09-20 Thread Chuanqi Xu via Phabricator via cfe-commits
ChuanqiXu created this revision.
ChuanqiXu added reviewers: iains, dblaikie, Bigcheese, vsapsai.
ChuanqiXu added projects: clang, clang-modules.
Herald added a project: All.
ChuanqiXu requested review of this revision.
Herald added subscribers: cfe-commits, MaskRay.

This patch tries to make clang to generate the BMI implicitly in the module 
cache, which allow the user to compile a hello world example in one command 
line.

This also tries to fix the problem I raised a year ago: 
https://discourse.llvm.org/t/make-command-line-support-for-c-20-module-uniform-with-gcc/59144

More importantly, the patch ease the implementation for build system writers, 
which makes their initial support more easily. For example, in this branch 
https://github.com/ChuanqiXu9/llvm-project/tree/MyP1689, we can compile the 
HelloWorld examples by cmake with some little changes.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D134267

Files:
  clang/include/clang/Basic/DiagnosticCommonKinds.td
  clang/lib/Driver/Driver.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/Driver/create_module_cache.cpp
  clang/test/Modules/one-phase-compilation-named-modules.cppm

Index: clang/test/Modules/one-phase-compilation-named-modules.cppm
===
--- /dev/null
+++ clang/test/Modules/one-phase-compilation-named-modules.cppm
@@ -0,0 +1,25 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: %clang -std=c++20 %t/M.cppm -c -o - -fmodules-cache-path=%t/pcm.cache
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+//
+// RUN: rm -f %t/M.pcm
+// RUN: %clang -std=c++20 %t/MismatchedName.cppm -c -o - -fmodules-cache-path=%t/pcm.cache -fmodule-name=M
+// RUN: %clang -std=c++20 %t/Use.cpp -fsyntax-only  -fprebuilt-module-path=%t/pcm.cache -Xclang -verify
+
+//--- M.cppm
+export module M;
+export int getValue();
+
+//--- MismatchedName.cppm
+export module M;
+export int getValue();
+
+//--- Use.cpp
+// expected-no-diagnostics
+import M;
+int Use() {
+return getValue();
+}
Index: clang/test/Driver/create_module_cache.cpp
===
--- /dev/null
+++ clang/test/Driver/create_module_cache.cpp
@@ -0,0 +1,20 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+// RUN: split-file %s %t
+//
+// RUN: not %clang -std=c++20 %t/M.cppm -fmodules-cache-path= -c -o %t/M.tmp.o 2>&1 | FileCheck %t/M.cppm --check-prefix=ERROR
+// RUN: %clang -std=c++20 %t/M.cppm -fmodules-cache-path=%t/abc -c -o -
+// RUN: ls %t | FileCheck %t/M.cppm --check-prefix=CHECK-AVAILABLE
+//
+// RUN: %clang -std=c++20 %t/Use.cpp -fmodules-cache-path=abc -### 2>&1 | FileCheck %t/Use.cpp
+
+//--- M.cppm
+export module M;
+
+// ERROR: unable to create default module cache path "": No such file or directory
+// CHECK-AVAILABLE: abc
+
+//--- Use.cpp
+import M;
+
+// CHECK: -fprebuilt-module-path=abc
Index: clang/lib/Driver/ToolChains/Clang.cpp
===
--- clang/lib/Driver/ToolChains/Clang.cpp
+++ clang/lib/Driver/ToolChains/Clang.cpp
@@ -3658,6 +3658,7 @@
  const ArgList , const InputInfo ,
  const InputInfo ,
  ArgStringList , bool ) {
+  bool HasCXXModules = HaveModules;
   // -fmodules enables the use of precompiled modules (off by default).
   // Users can pass -fno-cxx-modules to turn off modules support for
   // C++/Objective-C++ programs.
@@ -3699,33 +3700,34 @@
 options::OPT_fno_implicit_modules, HaveClangModules)) {
 if (HaveModules)
   CmdArgs.push_back("-fno-implicit-modules");
-  } else if (HaveModules) {
+  } else if (HaveModules)
 ImplicitModules = true;
-// -fmodule-cache-path specifies where our implicitly-built module files
-// should be written.
-SmallString<128> Path;
-if (Arg *A = Args.getLastArg(options::OPT_fmodules_cache_path))
-  Path = A->getValue();
 
-bool HasPath = true;
-if (C.isForDiagnostics()) {
-  // When generating crash reports, we want to emit the modules along with
-  // the reproduction sources, so we ignore any provided module path.
-  Path = Output.getFilename();
-  llvm::sys::path::replace_extension(Path, ".cache");
-  llvm::sys::path::append(Path, "modules");
-} else if (Path.empty()) {
-  // No module path was provided: use the default.
-  HasPath = Driver::getDefaultModuleCachePath(Path);
-}
-
-// `HasPath` will only be false if getDefaultModuleCachePath() fails.
-// That being said, that failure is unlikely and not caching is harmless.
-if (HasPath) {
-  const char Arg[] = "-fmodules-cache-path=";
-  Path.insert(Path.begin(), Arg, Arg + strlen(Arg));
-  CmdArgs.push_back(Args.MakeArgString(Path));
-}
+  // -fmodule-cache-path specifies where our implicitly-built module