[PATCH] D138073: [clang-doc] Move file layout to the generators.

Paul Kirth via Phabricator via cfe-commits Thu, 17 Nov 2022 13:05:53 -0800

paulkirth added inline comments.


================
Comment at: clang-tools-extra/clang-doc/HTMLGenerator.cpp:890-892
+    // TODO If there are multiple Infos for this file name (for example,
+    // template specializations), this will generate multiple complete web 
pages
+    // (with <DOCTYPE> and <title>, etc.) concatenated together. This generator
----------------
brettw wrote:
> paulkirth wrote:
> > So does this patch make HTML unusable? Currently, if there is a conflict, 
> > the file gets replaced, right? 
> > 
> > I read this as we're going from a situation where we lost data, but still 
> > had valid HTML pages to one where we have complete, but incorrect HTML, is 
> > that correct?
> > 
> > Also, why does that happen under template specialization? USRs are SHA1 
> > hashes of the symbol. I would expect that hashing the symbol would be 
> > unambiguous and unique, given C++'s ODR. 
> > 
> > That last point made me wonder if an alternate way to handle this without 
> > using the hash would be to use the mangled names?
> I think this makes HTML strictly better (though not ideal).
> 
> Previously if there was a collision, they got written over the top of each 
> other without truncation (not sure why). So if the longer page was written 
> second, you got lucky and you get a complete valid page and just lost the 
> shorter struct's definition. If you got unlucky the shorter page was written 
> second and you get a corrupted file with the shorter content, followed by the 
> end of the longer content peeking out from the end.
> 
> With this change, you get both content concatenated without corruption, you 
> just have duplicate doctype and title tags in between them. Browsers should 
> generally handle this so the page should be usable.
> 
> Fixing this requires a refactor of the HTML generator such that the concept 
> of writing a struct is separate from writing a page. Even if I was going to 
> work on the HTML generator, I would probably want to do this in a separate 
> patch anyway.
> 
> The markdown generator does the same, but markdown doesn't have all of the 
> headers stuff that shouldn't be in the middle of the file, you just get a new 
> `# ...` for the second section and it should be reasonable.
> 
> The HTML and MD generators name files according to the `Name` of the 
> structure. USR is not involved. So any time to things in the same namespace 
> have the same name, you'll get a collision. This happens most commonly with 
> template specialization because the names are identical (the name doesn't 
> account for the template parameters).
> 
> All of this complexity is why I changed the YAML output to use USR which 
> avoids the problem. I didn't want to make this change for the HTML and MD 
> generators because it will make ugly file names that the user will see, and 
> you probably want the template specializations to be in the same file anyway.
> I think this makes HTML strictly better (though not ideal).
> 
> Previously if there was a collision, they got written over the top of each 
> other without truncation (not sure why). So if the longer page was written 
> second, you got lucky and you get a complete valid page and just lost the 
> shorter struct's definition. If you got unlucky the shorter page was written 
> second and you get a corrupted file with the shorter content, followed by the 
> end of the longer content peeking out from the end.
> 
> With this change, you get both content concatenated without corruption, you 
> just have duplicate doctype and title tags in between them. Browsers should 
> generally handle this so the page should be usable.
> 

Thanks for the explanation. My concern was that this change made HTML unusable, 
if it is still valid (if not ideal) then I don't think there is an issue.

> Fixing this requires a refactor of the HTML generator such that the concept 
> of writing a struct is separate from writing a page. Even if I was going to 
> work on the HTML generator, I would probably want to do this in a separate 
> patch anyway.
> 

No arguments re splitting up work. Can you file an issue to track this on 
github and reference it in the TODO? That should help make sure this gets 
addressed.

> The markdown generator does the same, but markdown doesn't have all of the 
> headers stuff that shouldn't be in the middle of the file, you just get a new 
> `# ...` for the second section and it should be reasonable.
> 
> The HTML and MD generators name files according to the `Name` of the 
> structure. USR is not involved. So any time to things in the same namespace 
> have the same name, you'll get a collision. This happens most commonly with 
> template specialization because the names are identical (the name doesn't 
> account for the template parameters).
> 

I think there should be a way to obtain the mangled name from whatever you're 
looking at. Those should be available, even from the AST, so if you've found a 
template specialization, it //should// be possible to get the mangled name 
(even for partial specializations).  This shouldn't necessarily block you, but 
it seems like that would solve the issue nicely for all three generators.


================
Comment at: clang-tools-extra/test/clang-doc/single-file-public.cpp:6
 // RUN: clang-doc --doxygen --public --executor=standalone -p %t %t/test.cpp 
-output=%t/docs
-// RUN: cat %t/docs/GlobalNamespace/Record.yaml | FileCheck %s 
--check-prefix=CHECK
+// RUN: cat %t/docs/`ls %t/docs | grep -v index.yaml` | FileCheck %s 
--check-prefix=CHECK
 // RUN: rm -rf %t
----------------
brettw wrote:
> paulkirth wrote:
> > Its OK to use `ls` and `grep` in the runline, but you cannot use the 
> > backtick syntax. It will cause problems on other platforms and I can't find 
> > any other usage of that in our codebase. TBH I'm a bit surprised this 
> > works, since `lit` has some interesting behavior around RUN. 
> > 
> > I think you can just use `cat < ls %t/docs | grep -v index.yaml` instead. I 
> > haven't tried it, but that is certainly a more typical method of doing this 
> > in a lit test. Using `find` may also work, but I only found one use in 
> > `llvm/test/ExecutionEngine/MCJIT/load-object-a.ll`.
> Your example doesn't work (it's trying to load a file named "ls") but I found 
> another solution using xargs (which is hopefully supported on all of the 
> platforms).
I don't think we require `xargs`, so I'm not sure you can rely on that. We do 
require `find`, so that may be a better choice, and you can even have it invoke 
`cat` directly. 

See: https://llvm.org/docs/GettingStarted.html#software


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138073/new/

https://reviews.llvm.org/D138073

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138073: [clang-doc] Move file layout to the generators.

Reply via email to