paulirwin commented on PR #1060: URL: https://github.com/apache/lucenenet/pull/1060#issuecomment-3263420002
> I started a review, but got into the weeds a bit because I realized that the current port doesn't match the behavior of Java (at least according to ChatGPT - unconfirmed). The new string overload behavior should match Java, which is to pass the relative path down to the OS where it is resolved relative to the current application (the place where the jar lives). I have analyzed this and determined that it largely behaves as you specify here: relative paths just work when passed down to the filesystem. Almost all .NET BCL methods that we use have predictable, correct behavior when relative paths are used. There is one exception, noted below. | Function | Works with relative path? | | --- | --- | | Directory.Exists | Yes | | FileStream constructor | Yes | | Directory.GetParent | Yes | | Directory.EnumerateFiles | Yes | | File.Exists | Yes | | FileInfo constructor | Yes | | DirectoryInfo constructor | Yes | | Path.GetDirectoryName | Somewhat* | | Path.GetFileName | Yes | | Directory.Delete | Yes | | File.Delete | Yes | | File.WriteAllText | Yes | | Directory.CreateDirectory | Yes | `Path.GetDirectoryName` (used here in a few places like Kuromoji, Benchmarks, and the CLI) does not itself error on a relative path, and works arguably correctly, but it returns a value that might not work correctly depending on how it was used previously. The function returns the directory component of a string containing a path, and does not resolve the directory of the path first. This means that for input values like `"."` and `".."`, it returns empty string (because those could be interpreted as the "file" component of a relative path, even though they have a special meaning). Then, for inputs like `"./foo.bar"`, it returns `"."`, because that is the directory component of the relative path. Methods like `WriteEnwikiLineDocTask.CategoriesLineFile` are fine with this behavior, because it's expected that you're passing in a full file name (and not a directory name). The only time when this is a problem is if you pass in those special relative directory names without a file name , which could happen in CompoundFileExtractor. So I've fixed that by reverting that change and leaving a comment. Note that only the Path class methods can have a problem, because the others actually hit the filesystem where the paths are resolved. Path just operates on strings without caring whether the path exists on disk. `Path.GetFileName` is not a problem because even if it's a completely slash-less path like `"foo.bar"` or any form of relative path to a file, the file name will extract correctly. Note also that `~` for a home directory shorthand does not work correctly in most BCL types, including FileInfo and DirectoryInfo anyways, so I do not think we need to worry about that. > On the other hand, with `FileInfo` and `DirectoryInfo`, relative paths are resolved eagerly and not passed down to the OS. So, when we call `FullName` we don't necessarily have the same behavior. As noted above, this is fine in the cases in this PR. > Unfortunately, `FileInfo` and `DirectoryInfo` have no way to retrieve the original string that was passed into the constructor. So, there doesn't seem to be a reasonable fix,. Well, there is a protected `OriginalString` property that we could get to using the [`UnsafeAccessorAttribute`](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.unsafeaccessorattribute) on `net8.0`+, but we are probably better off accepting the differences of calling `FullName` in most cases. Yeah, I don't think we need to worry about this, since relative paths work for I/O. > Also, I suspect but haven't yet confirmed that the relative path behavior of our command line tools is broken. AFAIK, the expected behavior in a .NET console app is for a relative path to be resolved to the **environment's** current directory, but the behavior we have is relative to the directory of the executable. This behavior might be altered by wrapping these commands into a dotnet tool or by our (3rd party) wrapper project, though. I'm not sure we need to worry about this unless someone reports it. Seems like worst case you'd get a file/directory not found error, fix it, and move on. > One other thing I noticed is that the `FileInfo.GetCanonicalPath()` method was not included here, probably because you didn't think to look for the common base class of `FileInfo` and `DirectoryInfo`: [`FileSystemInfo`](https://learn.microsoft.com/en-us/dotnet/api/system.io.filesysteminfo). This is the only place outside of the test framework and one test where we use it, though. I did see this, but I don't think it's in scope (at least not for having a string-based overload). The method docs say that it returns the path "with all references resolved" and the easiest way to do that is to allocate a FileInfo/DirectoryInfo and calling `FullName` like it does anyways. I'm not 100% convinced that this method truly needs to exist, but I am not going to worry about that at this time. (For example, using the lowercase path `"c:\\windows"` works fine in .NET operations with the lowercase drive letter.) Now, there might be some possible analysis that could be done on whether or not code that _calls_ this method could avoid canonicalization and thus avoid FileInfo/DirectoryInfo... but that would be beyond the scope of just reducing allocations, as it possibly would result in non-canonical paths being used (assuming that is even important). I am not aware of a way to resolve a path without hitting the disk via I/O, and if you're doing that, the allocation of a FileInfo/DirectoryInfo is pretty negligible. The goal of this PR is only to help reduce allocations for cases where resolving the path is not necessary. Also, it should be noted, _the original overloads are still there_. So if someone has a particular reason for using FileInfo/DirectoryInfo, they still can. This is not a breaking-change PR. So any new code that tries to use the string versions will need to confirm that it works for them with relative paths (it probably will), or just use an absolute path. That said, I can add more unit tests to confirm this beyond my analysis and testing of the BCL. So I'll do that for what I can. Converting back to draft until I get that done. Feel free to review, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
