[PATCH] D138073: [clang-doc] Move file layout to the generators.

Brett Wilson via Phabricator via cfe-commits Fri, 18 Nov 2022 15:10:38 -0800

brettw added inline comments.


================
Comment at: clang-tools-extra/clang-doc/HTMLGenerator.cpp:890-892
+    // TODO If there are multiple Infos for this file name (for example,
+    // template specializations), this will generate multiple complete web 
pages
+    // (with <DOCTYPE> and <title>, etc.) concatenated together. This generator
----------------
paulkirth wrote:
> brettw wrote:
> > paulkirth wrote:
> > > So does this patch make HTML unusable? Currently, if there is a conflict, 
> > > the file gets replaced, right? 
> > > 
> > > I read this as we're going from a situation where we lost data, but still 
> > > had valid HTML pages to one where we have complete, but incorrect HTML, 
> > > is that correct?
> > > 
> > > Also, why does that happen under template specialization? USRs are SHA1 
> > > hashes of the symbol. I would expect that hashing the symbol would be 
> > > unambiguous and unique, given C++'s ODR. 
> > > 
> > > That last point made me wonder if an alternate way to handle this without 
> > > using the hash would be to use the mangled names?
> > I think this makes HTML strictly better (though not ideal).
> > 
> > Previously if there was a collision, they got written over the top of each 
> > other without truncation (not sure why). So if the longer page was written 
> > second, you got lucky and you get a complete valid page and just lost the 
> > shorter struct's definition. If you got unlucky the shorter page was 
> > written second and you get a corrupted file with the shorter content, 
> > followed by the end of the longer content peeking out from the end.
> > 
> > With this change, you get both content concatenated without corruption, you 
> > just have duplicate doctype and title tags in between them. Browsers should 
> > generally handle this so the page should be usable.
> > 
> > Fixing this requires a refactor of the HTML generator such that the concept 
> > of writing a struct is separate from writing a page. Even if I was going to 
> > work on the HTML generator, I would probably want to do this in a separate 
> > patch anyway.
> > 
> > The markdown generator does the same, but markdown doesn't have all of the 
> > headers stuff that shouldn't be in the middle of the file, you just get a 
> > new `# ...` for the second section and it should be reasonable.
> > 
> > The HTML and MD generators name files according to the `Name` of the 
> > structure. USR is not involved. So any time to things in the same namespace 
> > have the same name, you'll get a collision. This happens most commonly with 
> > template specialization because the names are identical (the name doesn't 
> > account for the template parameters).
> > 
> > All of this complexity is why I changed the YAML output to use USR which 
> > avoids the problem. I didn't want to make this change for the HTML and MD 
> > generators because it will make ugly file names that the user will see, and 
> > you probably want the template specializations to be in the same file 
> > anyway.
> > I think this makes HTML strictly better (though not ideal).
> > 
> > Previously if there was a collision, they got written over the top of each 
> > other without truncation (not sure why). So if the longer page was written 
> > second, you got lucky and you get a complete valid page and just lost the 
> > shorter struct's definition. If you got unlucky the shorter page was 
> > written second and you get a corrupted file with the shorter content, 
> > followed by the end of the longer content peeking out from the end.
> > 
> > With this change, you get both content concatenated without corruption, you 
> > just have duplicate doctype and title tags in between them. Browsers should 
> > generally handle this so the page should be usable.
> > 
> 
> Thanks for the explanation. My concern was that this change made HTML 
> unusable, if it is still valid (if not ideal) then I don't think there is an 
> issue.
> 
> > Fixing this requires a refactor of the HTML generator such that the concept 
> > of writing a struct is separate from writing a page. Even if I was going to 
> > work on the HTML generator, I would probably want to do this in a separate 
> > patch anyway.
> > 
> 
> No arguments re splitting up work. Can you file an issue to track this on 
> github and reference it in the TODO? That should help make sure this gets 
> addressed.
> 
> > The markdown generator does the same, but markdown doesn't have all of the 
> > headers stuff that shouldn't be in the middle of the file, you just get a 
> > new `# ...` for the second section and it should be reasonable.
> > 
> > The HTML and MD generators name files according to the `Name` of the 
> > structure. USR is not involved. So any time to things in the same namespace 
> > have the same name, you'll get a collision. This happens most commonly with 
> > template specialization because the names are identical (the name doesn't 
> > account for the template parameters).
> > 
> 
> I think there should be a way to obtain the mangled name from whatever you're 
> looking at. Those should be available, even from the AST, so if you've found 
> a template specialization, it //should// be possible to get the mangled name 
> (even for partial specializations).  This shouldn't necessarily block you, 
> but it seems like that would solve the issue nicely for all three generators.
TODO added.

I don't think the mangled names helps both because I think you want the file 
names to be readable (the mangled names are barely better than the USR) and 
because I think you also want the specializations to all end up in the same 
file for end-user documentation. I discussed this in the bug.


================
Comment at: clang-tools-extra/test/clang-doc/single-file-public.cpp:6
 // RUN: clang-doc --doxygen --public --executor=standalone -p %t %t/test.cpp 
-output=%t/docs
-// RUN: cat %t/docs/GlobalNamespace/Record.yaml | FileCheck %s 
--check-prefix=CHECK
+// RUN: cat %t/docs/`ls %t/docs | grep -v index.yaml` | FileCheck %s 
--check-prefix=CHECK
 // RUN: rm -rf %t
----------------
paulkirth wrote:
> brettw wrote:
> > paulkirth wrote:
> > > Its OK to use `ls` and `grep` in the runline, but you cannot use the 
> > > backtick syntax. It will cause problems on other platforms and I can't 
> > > find any other usage of that in our codebase. TBH I'm a bit surprised 
> > > this works, since `lit` has some interesting behavior around RUN. 
> > > 
> > > I think you can just use `cat < ls %t/docs | grep -v index.yaml` instead. 
> > > I haven't tried it, but that is certainly a more typical method of doing 
> > > this in a lit test. Using `find` may also work, but I only found one use 
> > > in `llvm/test/ExecutionEngine/MCJIT/load-object-a.ll`.
> > Your example doesn't work (it's trying to load a file named "ls") but I 
> > found another solution using xargs (which is hopefully supported on all of 
> > the platforms).
> I don't think we require `xargs`, so I'm not sure you can rely on that. We do 
> require `find`, so that may be a better choice, and you can even have it 
> invoke `cat` directly. 
> 
> See: https://llvm.org/docs/GettingStarted.html#software
I found a `find` invocation that worked. I couldn't get it to recognize the 
"repeats 40 times" regex syntax so I copy-pasted [0-9A-Z] 40 times like in the 
body of the test.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138073/new/

https://reviews.llvm.org/D138073

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D138073: [clang-doc] Move file layout to the generators.

Reply via email to