labath added inline comments.

================
Comment at: source/Breakpoint/BreakpointResolverFileLine.cpp:130-131
+    // we must leave the slash character though.
+    while (relative_path.consume_front("."))
+      /* Do nothing. */;
+  }
----------------
What about paths like `.foo/bar.c` and `.../bar.c`. These don't contain `.` or 
`..` components. so you'll want to avoid the consuming here. (You can use 
`*llvm::sys::path::begin(path)` to get the first path component and check that).

Also, I'm not sure what is the behavior we want for paths like `../../foo.cpp`. 
The present behavior of matching the path as it was `/../foo.cpp` does not seem 
entirely useful.


================
Comment at: source/Utility/FileSpec.cpp:107
+  for (auto i = path.find('/'); i != llvm::StringRef::npos;
+       i = path.find('/', i + 1)) {
+    const auto nextChar = safeCharAtIndex(path, i+1);
----------------
aprantl wrote:
> clayborg wrote:
> > clayborg wrote:
> > > aprantl wrote:
> > > > clayborg wrote:
> > > > > I am fine switching to using the llvm functions for removing, this is 
> > > > > only detecting if any normalization needs to happen. If there is an 
> > > > > equivalent LLVM function that will only run through the string one 
> > > > > time to detect all needs for normalization (no multiple passes 
> > > > > looking for "..", then for "." etc like we used to have), I would be 
> > > > > happy to use it, but there doesn't seem to be. We are trying to avoid 
> > > > > having to create a "SmallVectorImpl< char >" for all paths if they 
> > > > > don't need fixing here. This function is just iterating through an 
> > > > > llvm::StringRef just to see if normalization needs to happen. If it 
> > > > > does, then we make a SmallVectorImpl< char > and we fix the path. We 
> > > > > can easily use llvm functions for the fixing part. 
> > > > I see, you want to avoid copying the string when it isn't necessary to 
> > > > do so. IMHO the best way to do this is to add a `bool hasDots(const 
> > > > Twine &)` function to llvm::sys::path and add a `bool remove_dot_dot` 
> > > > parameter to `removeDots()`. Since you have already written the 
> > > > unittests, adding the function to LLVM shouldn't be any extra work 
> > > > (I'll help reviewing), and we'll have 100 LoC less to maintain.
> > > > 
> > > > This way we can avoid having two functions that perform path 
> > > > normalization that superficially look like they might do the same 
> > > > thing, but a year from now nobody can really tell whether they behave 
> > > > exactly the same, and whether a bug found in one implementation should 
> > > > also be fixed in the other one, etc... .
> > > I want to know if there are redundant slashes as well. I want something 
> > > like "bool llvm::sys::has(const Twine &, bool dots, bool dot_dots, bool 
> > > redundant_slashes)". But I don't see that getting accepted? I just want a 
> > > "bool llvm:sys::path::contains_redundant_stuff(const Twine &)".  Not sure 
> > > on the name though. needs_normalization? can_be_shortened? Everything in 
> > > LLVM right now assumes that dots, dot dots, slashes and realpath stuff 
> > > happen in the same function. Not sure how to break that up without 
> > > ruining the performance of the one and only loop that should be 
> > > happening. I like the current approach since it doesn't require chopping 
> > > up the string into an array and iterating through the array.
> > Also not sure if the LLVM stuff will try to substitute in the current 
> > working directory for "."?
> I just read the source code of remove_dots() 
> (http://llvm.org/doxygen/Path_8cpp_source.html#l00699) and if I'm not 
> mistaken, it actually also removes double-separators as a side-effect, since 
> it iterates over the path components of the string as it constructs the copy. 
> It also seems to avoid the copy operation if it isn't necessary.
> 
> Could you take another look? Perhaps I'm missing something, (or perhaps we 
> can just turn this into a small addition to remove_dots).
> 
> > Everything in LLVM right now assumes that dots, dot dots, slashes and 
> > realpath stuff happen in the same function.
> I'm not sure I understand.
FWIW, this could be implemented in a much simpler way, while still making sure 
we run through the string only once. I'm thinking of something like:
```
bool first = true;
for(it = llvm::sys::path::begin(path, style), end = llvm::sys::path::end(path); 
it != end; ++it) {
  if (!first && (*it == "." || *it == "..")) return true;
  first = false;
}
return false;
```


================
Comment at: source/Utility/FileSpec.cpp:184-187
+    if (component == ".") {
+      if (!normalized.empty())
+        continue; // Skip '.' unless it is the first thing
+    }
----------------
This looks wrong, because then for a path like `./../whatever`, you will end up 
with both `.` and `..` in the "normalized" path.


================
Comment at: source/Utility/FileSpec.cpp:401-406
+    // Make sure we don't end up with "." in both the directory and filename.
+    // If we do, clear the directory. 
+    m_filename.SetString(".");
+    if (m_filename == m_directory)
+      m_directory.Clear();
+  }
----------------
This is an interesting edge case, but I'm not sure if we actually want to be 
doing this. I think we should make an effort to formally define what do we 
expect from the normalized path. After the lldb-dev discussion on `./foo.cpp` I 
have formed a tentative definition in my head, and this does not seem 
consistent with that.

I'll try to propose one definition:

Def: Two `FileSpec`s are **equivalent** iff:
- they resolve to the same file in a virtual filesystem, where all referenced 
path components exist and are **directories**
- the current working directory is deep enough so that each `..` component 
resolves to a different directory than its predecessor.
- both of them have at least two path components or both of them have just one 
path component.

Def: The **normalized** form of a `FileSpec` is the **equivalent** `FileSpec` 
with the least number path components. In case of ties, we choose the one with 
the least number of `..` components.


Some explanations about how I arrived at this:
a) the "all paths exist and are directories" part is there so we can consider 
`foo/../bar.cpp` and `./bar.cpp` equivalent. (Because, in general, this will 
not be true if `foo` does not exists, or is a regular file, or a symlink, etc.)
b) the "cwd is deep enough" part is there to avoid collapsing excessive leading 
".." components, because `/` and `/..` are the same file
c) the "at least two path components" is there to avoid collapsing `./foo.cpp` 
into `foo.cpp` as we want to preserve the information that the file contained 
some kind of directory specification.
d) in the equivalence definition, the "least number of components" is there to 
make sure we remove all components that are removable and (hopefully) arrive at 
a single canonical representation.
e) the "tie" case is necessary to disambiguate between `foo/..` and `./.` as 
otherwise these two would be incomparable.


Now the (c) part would imply that `foo/..` should normalize to `./.` and not 
`.`, which is a bit odd, but it is consistent with our stated intention of 
preserving directory information. If we do not want to have `./.` here, then we 
need to come up with a different definition of what it means to be "normalized".


================
Comment at: source/Utility/FileSpec.cpp:857
+        !IsPathSeparator(components[i].back(), syntax) &&
+        (result.empty() || !IsPathSeparator(result.back(), syntax)))
       result += GetPreferredPathSeparator(syntax);
----------------
Why is this necessary? `result` can't be empty because we have just appended 
something to it (and we've checked that `components[i]` is non-empty).


================
Comment at: unittests/Utility/FileSpecTest.cpp:150-153
-  Compare(forward, backward, full_match, remove_backup_dots, match);
-  Compare(forward, backward, full_match, !remove_backup_dots, match);
-  Compare(forward, backward, !full_match, remove_backup_dots, match);
-  Compare(forward, backward, !full_match, !remove_backup_dots, match);
----------------
You've removed the `remove_backup_dots` argument from the `FileSpec::Equal`, 
which removes two of the combinations we need to test here. However, I think 
the other other two combination still deserve some testing. As it stands now, 
this completely removes the test coverage of the `Equal` function.


https://reviews.llvm.org/D45977



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to