I committed this as r187359. Testing this a bit more shows that we still don't implement this exactly like native tools. The behavior is *really* strange. The meaning of "C:foo.c" seems to be
* C:\foo.c if the current drive *is not* C * .\foo.c if the current drive *is* C I guess we can implement this in parent_path by having it return "." for "C:foo.c" if the current drive is C and "C:\" if it is not. On 26 July 2013 12:59, Rafael Espíndola <[email protected]> wrote: > On 23 July 2013 17:56, Yunzhong Gao <[email protected]> wrote: >> Hi Aaron and Rafael, >> >> Many thanks for reviewing my patch. I did a little more investigation on >> how >> the file I/O works on LLVM/Clang. >> >> When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and >> ::stat("C:test.c", &stat_buffer) return success. On the other hand, >> ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\") >> returns success. >> >> Rafael asked whether this is a clang-specifc problem or a llvm problem. >> It looks like clang uses FileManager::getFile() to open a file, which will >> first check for existence of the enclosing directory before trying to open >> the file. This first check is where the trailing back slash makes the >> difference. It is done by calling FileSystemStatCache::get() on the >> directory >> name, which then calls library function ::stat(). Eventually it uses >> MemoryBuffer::getFile() to open the file. >> >> I checked the other llvm tools and they do not checking for the enclosing >> directory as clang does: >> llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile() >> to open a file. It does not check for existence of the enclosing >> directory, so the difference between stat("C:\\") and stat("C:") does >> not affect llvm-as here. >> bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use >> llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input >> file. Again they do not check for existence of the enclosing directory. >> llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, >> llvm-dwarfdump, >> llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck, >> utils/FileUpdate, llvm-tblgen, clang-tblgen also use >> MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open >> the input file without checking for existence of the enclosing >> directory. >> llvm-objdump, llvm-readobj, llvm-size and macho-dump use >> llvm::object::createBinary() and then MemoryBuffer::getFile() to open >> the >> input file without checking for existence of the enclosing file. >> llvm-ar uses Memory::getFile() to open the archive file. It does >> not check for existence of the enclosing file. >> llvm-config needs to compare current directory to development tree for >> equality, and when doing so, always gets absolute paths first. >> llvm-dis uses DataStreamer::OpenFile() and then >> sys::fs::openFileForRead(), >> which calls open() on Linux and CreateFileW() on Windows. >> llvm-extract uses llvm::getLazyIRFileModule() and then >> MemoryBuffer::getFile() to open the input file. >> llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream() >> to open a file for write, which calls sys::fs::openFileForWrite(). >> llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then >> getOrCreateBinary() => object::createBinary() => >> MemoryBuffer::getFile() >> to open the input file. >> utils/count reads from stdin. >> Because none of these llvm tools check for existence of the enclosing >> directory before opening their input files, they do not have the same >> problem >> as clang. So to answer Rafael's question, this is indeed specific to clang. >> >> My proposed patch is in FileManager::getDirectory() because I see some >> codes >> in the beginning of this function already trying to patch the directory >> name. >> Alternatively we could also try to fix up the directory name in llvm::sys:: >> path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in >> getDirectoryFromFile() could take the same approach as I did in >> getDirectory(). >> Doing it in parent_path() might be a bit more tricky because to grow the >> string, some memory allocator is needed: I would like some advice on what >> is >> the best way to handle it. >> >> To answer Aaron's question, I used path::root_name() instead of >> path::root_path() so that the expression "DirName == root_name(DirName)" >> returns different results for "C:" and "C:\", where the fix is really only >> needed for the "C:" case. >> >> "C:test.c" is not directly passed to path::root_name(). Before entering >> FileManager::getDirectory(), where my patch is intended, >> path::parent_path() >> will be run on the file path to extract the directory name first before >> passing the directory name to path::root_name(), which returns "C:" for >> both C:\ and C:. Does this address your question, Aaron? >> >> I wrote a unit test thanks to Aaron's suggestion. I am adding it to >> clang/unittests/Basic/FileManagerTest.cpp. Could you take a look? > > Thanks for the detailed explanation. I think this fix (with the test) > is good for now. Aaron, any objections? > > Ideas for future improvement: > * We should probably change FileManager::getDirectory to not play > with the directory name and move the responsibility to parent_path. > The only valid driver names are the letters from a to z? If so we > would be able to keep the interface and just keep some global string > constants: > > static const char * DriveRoots[] = {"a:\", "b:\"....} > > and return one of them when given something like "c:foo". > > * getUniqueID should probably return a pair of 64 bit numbers. On unix > it should include the st_dev. On windows we could drop the xor trick. > > * FileManager::UniqueDirContainer::getDirectory could then stop using > the filename on windows and have a common unix and windows > implementation. > >> Many thanks, >> - Gao. > > Cheers, > Rafael _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
