lebedev.ri added a comment.

Next, i suggest to look into code self-debugging, see comments.
Also, i have added a few questions, it would be great to know that my 
understanding is correct?

I'm sorry that it seems like we are going over and over and over over the same 
code again, 
this is the very base of the tool, i think it is important to get it as close 
to great as possible.
I *think* these review comments move it in that direction, not in the opposite 
direction?



================
Comment at: clang-doc/BitcodeWriter.cpp:47
+      BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed,
+                      BitCodeConstants::LineNumFixedSize));  // Line number
+  Abbrev->Add(
----------------
So in other words this is making an assumption that no file with more than 
65535 lines will be analyzed, correct?
Can you add that as comment please?


================
Comment at: clang-doc/BitcodeWriter.cpp:56
+  StringRef Name;
+  AbbrevDsc Abbrev;
+
----------------
```
  AbbrevDsc Abbrev = nullptr;
```


================
Comment at: clang-doc/BitcodeWriter.cpp:57
+  AbbrevDsc Abbrev;
+
+  RecordIdDsc() = default;
----------------
```
// Is this 'description' valid?
operator bool() const {
  return Abbrev != nullptr && Name.data() != nullptr && !Name.empty();
}
```


================
Comment at: clang-doc/BitcodeWriter.cpp:137
+          {FUNCTION_LOCATION, {"Location", &LocationAbbrev}},
+          {FUNCTION_PARENT, {"Parent", &StringAbbrev}},
+          {FUNCTION_ACCESS, {"Access", &IntAbbrev}}};
----------------
So `FUNCTION_MANGLED_NAME` is phased out, and is thus missing, as far as i 
understand?


================
Comment at: clang-doc/BitcodeWriter.cpp:148
+void ClangDocBitcodeWriter::AbbreviationMap::add(RecordId RID,
+                                                 unsigned AbbrevID) {
+  assert(Abbrevs.find(RID) == Abbrevs.end() && "Abbreviation already added.");
----------------
+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`


================
Comment at: clang-doc/BitcodeWriter.cpp:153
+
+unsigned ClangDocBitcodeWriter::AbbreviationMap::get(RecordId RID) const {
+  assert(Abbrevs.find(RID) != Abbrevs.end() && "Unknown abbreviation.");
----------------
+`assert(RecordIdNameMap[ID] && "Unknown Abbreviation");`


================
Comment at: clang-doc/BitcodeWriter.cpp:158
+
+void ClangDocBitcodeWriter::AbbreviationMap::clear() { Abbrevs.clear(); }
+
----------------
Called only once, and that call does nothing.
I'd drop it.


================
Comment at: clang-doc/BitcodeWriter.cpp:175
+/// \brief Emits a block ID and the block name to the BLOCKINFO block.
+void ClangDocBitcodeWriter::emitBlockID(BlockId ID) {
+  Record.clear();
----------------
```
/// \brief Emits a block ID and the block name to the BLOCKINFO block.
void ClangDocBitcodeWriter::emitBlockID(BlockId ID) {
  const auto& BlockIdName = BlockIdNameMap[ID];
  assert(BlockIdName.data() && BlockIdName.size() && "Unknown BlockId!");

  Record.clear();
  Record.push_back(ID);
  Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_SETBID, Record);

  Record.clear();
  for (const char C : BlockIdName) Record.push_back(C);
  Stream.EmitRecord(llvm::bitc::BLOCKINFO_CODE_BLOCKNAME, Record);
}
```


================
Comment at: clang-doc/BitcodeWriter.cpp:187
+void ClangDocBitcodeWriter::emitRecordID(RecordId ID) {
+  prepRecordData(ID);
+  for (const char C : RecordIdNameMap[ID].Name) Record.push_back(C);
----------------
```
/// \brief Emits a record name to the BLOCKINFO block.
void ClangDocBitcodeWriter::emitRecordID(RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  prepRecordData(ID);
```
(Yes, `prepRecordData()` will have the same code. It should get optimized away.)


================
Comment at: clang-doc/BitcodeWriter.cpp:194
+
+void ClangDocBitcodeWriter::emitAbbrev(RecordId ID, BlockId Block) {
+  auto Abbrev = std::make_shared<BitCodeAbbrev>();
----------------
```
void ClangDocBitcodeWriter::emitAbbrev(RecordId ID, BlockId Block) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  auto Abbrev = std::make_shared<BitCodeAbbrev>();
```


================
Comment at: clang-doc/BitcodeWriter.cpp:204
+void ClangDocBitcodeWriter::emitRecord(StringRef Str, RecordId ID) {
+  if (!prepRecordData(ID, !Str.empty())) return;
+  Record.push_back(Str.size());
----------------
So remember that in a previous iteration, seemingly useless `AbbrevDsc` stuff 
was added to the `RecordIdNameMap`?
It is going to pay-off now:
```
void ClangDocBitcodeWriter::emitRecord(StringRef Str, RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  assert(RecordIdNameMap[ID].Abbrev == &StringAbbrev && "Abbrev type mismatch");
  if (!prepRecordData(ID, !Str.empty())) return;
...
```
And if we did not add an `RecordIdNameMap` entry for this `RecordId`, then i 
believe that will also be detected because `Abbrev` will be a `nullptr`.


================
Comment at: clang-doc/BitcodeWriter.cpp:205
+  if (!prepRecordData(ID, !Str.empty())) return;
+  Record.push_back(Str.size());
+  Stream.EmitRecordWithBlob(Abbrevs.get(ID), Record, Str);
----------------
```
assert(Str.size() < (1U << BitCodeConstants::StringLengthSize));
Record.push_back(Str.size());
```


================
Comment at: clang-doc/BitcodeWriter.cpp:210
+void ClangDocBitcodeWriter::emitRecord(const Location &Loc, RecordId ID) {
+  if (!prepRecordData(ID, !OmitFilenames)) return;
+  Record.push_back(Loc.LineNumber);
----------------
```
void ClangDocBitcodeWriter::emitRecord(const Location &Loc, RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  assert(RecordIdNameMap[ID].Abbrev == &LocationAbbrev && "Abbrev type 
mismatch");
  if (!prepRecordData(ID, !OmitFilenames)) return;
...
```


================
Comment at: clang-doc/BitcodeWriter.cpp:211
+  if (!prepRecordData(ID, !OmitFilenames)) return;
+  Record.push_back(Loc.LineNumber);
+  Record.push_back(Loc.Filename.size());
----------------
Call me paranoid, but:
```
assert(Loc.LineNumber < (1U << BitCodeConstants::LineNumberSize));
Record.push_back(Loc.LineNumber);
assert(Loc.Filename.size()) < (1U << BitCodeConstants::StringLengthSize));
Record.push_back(Loc.Filename.size());
```


================
Comment at: clang-doc/BitcodeWriter.cpp:217
+void ClangDocBitcodeWriter::emitRecord(int Val, RecordId ID) {
+  if (!prepRecordData(ID, Val)) return;
+  Record.push_back(Val);
----------------
```
void ClangDocBitcodeWriter::emitRecord(int Val, RecordId ID) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  assert(RecordIdNameMap[ID].Abbrev == &IntAbbrev && "Abbrev type mismatch");
  if (!prepRecordData(ID, Val)) return;
```


================
Comment at: clang-doc/BitcodeWriter.cpp:218
+  if (!prepRecordData(ID, Val)) return;
+  Record.push_back(Val);
+  Stream.EmitRecordWithAbbrev(Abbrevs.get(ID), Record);
----------------
```
assert(Val < (1U << BitCodeConstants::IntSize));
Record.push_back(Val);
```


================
Comment at: clang-doc/BitcodeWriter.cpp:222
+
+bool ClangDocBitcodeWriter::prepRecordData(RecordId ID, bool ShouldEmit) {
+  if (!ShouldEmit) return false;
----------------
```
bool ClangDocBitcodeWriter::prepRecordData(RecordId ID, bool ShouldEmit) {
  assert(RecordIdNameMap[ID] && "Unknown Abbreviation");
  if (!ShouldEmit) return false;
```


================
Comment at: clang-doc/BitcodeWriter.cpp:232
+void ClangDocBitcodeWriter::emitBlockInfoBlock() {
+  Abbrevs.clear();
+  emitHeader();
----------------
Since `ClangDocBitcodeWriter` is not re-used, but re-constructed* each time, 
`Abbrevs.clear();` does nothing.

* Hmm, i wonder if that will be a bad thing. Benchmarking will tell i guess :/


================
Comment at: clang-doc/BitcodeWriter.cpp:236
+
+  std::initializer_list<std::pair<BlockId, std::initializer_list<RecordId>>>
+      TheBlocks{// Version Block
----------------
https://godbolt.org/g/rD6BWK also suggests it should be `static const`


================
Comment at: clang-doc/BitcodeWriter.cpp:276
+void ClangDocBitcodeWriter::emitBlockInfo(BlockId BID,
+                                          const std::vector<RecordId> RIDs) {
+  emitBlockID(BID);
----------------
Uhm, do you plan on calling `emitBlockInfo()` from anywhere else other than 
`emitBlockInfoBlock()`?
Since it takes `const std::vector<RecordId>` instead of a `const 
std::initializer_list<RecordId>&`, a memory copy will happen...
https://godbolt.org/g/rD6BWK


================
Comment at: clang-doc/BitcodeWriter.h:35
+
+struct BitCodeConstants {
+  static constexpr int SubblockIDSize = 5;
----------------
`LineNumFixedSize` is used for a different things. Given such a specific name, 
i think it may be confusing?
Also, looking at 
http://llvm.org/doxygen/classllvm_1_1BitstreamWriter.html#ae6a40b4a5ea89bb8b5076c26e0d0b638
 i guess these all should be `unsigned`.

I think this would be better, albeit more verbose:
```
struct BitCodeConstants {
  static constexpr unsigned SignatureBitSize = 8U;
  static constexpr unsigned SubblockIDSize = 5U;
  static constexpr unsigned IntSize = 16U;
  static constexpr unsigned StringLengthSize = 16U;
  static constexpr unsigned LineNumberSize = 16U;
};
```


================
Comment at: clang-doc/BitcodeWriter.h:53
+  BI_LAST = BI_COMMENT_BLOCK_ID
+};
+
----------------
So what *exactly* does `BitCodeConstants::SubblockIDSize` mean?
```
static_assert(BI_LAST < (1U << BitCodeConstants::SubblockIDSize), "Too many 
block id's!");
```
?


================
Comment at: clang-doc/BitcodeWriter.h:94
+  FUNCTION_LOCATION,
+  FUNCTION_MANGLED_NAME,
+  FUNCTION_PARENT,
----------------
So i have a question: if something (`FUNCTION_MANGLED_NAME` in this case) is 
phased out, does it have to stay in this enum?
That will introduce holes in `RecordIdNameMap`.
Are the actual numerical id's of enumerators stored in the bitcode, or the 
string (abbrev, `RecordIdNameMap[].Name`)?

Looking at tests, i guess these enums are internal detail, and they can be 
changed freely, including removing enumerators.
Am i wrong?

I think that should be explained in a comment before this `enum`.


================
Comment at: clang-doc/BitcodeWriter.h:100
+};
+
+#undef INFORECORDS
----------------
**If** `AbbreviationMap` comment makes sense, i guess that common code should 
be moved here, i.e.
```
static constexpr unsigned RecordIdCount = RI_LAST - RI_FIRST + 1;
```
and use this new variable in those two places.


================
Comment at: clang-doc/BitcodeWriter.h:163
+   public:
+    AbbreviationMap() {}
+    void add(RecordId RID, unsigned AbbrevID);
----------------
We know we will have at most `RI_LAST - RI_FIRST + 1` abbreviations.
Right now that results in just ~40 abbreviations.
Would it make sense to
```
AbbreviationMap() : Abbrevs(RI_LAST - RI_FIRST + 1) {}
```
?
(or `llvm::DenseMap<unsigned, unsigned> Abbrevs = llvm::DenseMap<unsigned, 
unsigned>(RI_LAST - RI_FIRST + 1);` but that looks uglier to me..)



https://reviews.llvm.org/D41102



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to