[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 Kevin Funk changed: What|Removed |Added Version Fixed In||5.1.2 -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 Sven Brauch changed: What|Removed |Added Latest Commit|54c2740ae97f4e3ee7675ff06ec |https://commits.kde.org/kde |5a6dde43c6af4 |velop/3fde9551d5870229d42cc ||1a61f152fc8d0876b71 Status|CONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #33 from Sven Brauch --- Git commit 3fde9551d5870229d42cc1a61f152fc8d0876b71 by Sven Brauch, on behalf of Axel Kellermann. Committed on 27/07/2017 at 22:28. Pushed by brauch into branch '5.1'. cmListFileLexer: use memcpy instead of strcpy to copy buffer contents to tokens This fixes a crash when the input is garbage (e.g. binary file which is not cmake code at all). Differential Revision: https://phabricator.kde.org/D6924 M +7-3projectmanagers/cmake/parser/cmListFileLexer.c https://commits.kde.org/kdevelop/3fde9551d5870229d42cc1a61f152fc8d0876b71 -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #32 from Axel Kellermann --- (In reply to Sven Brauch from comment #31) > Ah yes, if you trick the mimetype detection, you will still crash. You are > also right about strncpy. > > phabricator should be simple enough to use, just log in (with > identity.kde.org), click differential, then click new diff and paste your > diff. OK, I added the diff to phabricator: https://phabricator.kde.org/D6924 -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #31 from Sven Brauch --- Ah yes, if you trick the mimetype detection, you will still crash. You are also right about strncpy. phabricator should be simple enough to use, just log in (with identity.kde.org), click differential, then click new diff and paste your diff. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #30 from Axel Kellermann --- (In reply to Sven Brauch from comment #28) > Patch still looks sensible to me -- maybe use strncpy instead, and put it on > phabricator.kde.org so somebody familiar with the code can have a look? strncpy()/strndup() won't work either. Both copy up to n bytes, but also end processing on the occurence of the first '\0' (at least to my understanding). I'll have a look at phabricator. Never used it before... -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #29 from Axel Kellermann --- (In reply to Sven Brauch from comment #27) > Does that still happen with KDevelop 5.1+? I thought I fixed it. See comment #22. Random .txt files aren't scanned anymore, but .cmake files can still trigger the faulty behaviour. I conducted my tests with cmListFileLexer.c from master (last updated beginning of the week). But I extracted it into a small reproducer application to ease debugging, so if you fixed it somewhere in the cmake parser outside of cmListFileLexer.c, maybe we don't have to do anything. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #28 from Sven Brauch --- Patch still looks sensible to me -- maybe use strncpy instead, and put it on phabricator.kde.org so somebody familiar with the code can have a look? -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #27 from Sven Brauch --- Does that still happen with KDevelop 5.1+? I thought I fixed it. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #26 from Axel Kellermann --- I had another look at the lexer and I think I pinned down the problem with UTF-16 files in cmListFileLexer.c. Scanning the files works fine, but copying the scanned content into the token structure in cmListFileLexerSetToken() is where things go wrong. The code that copies the content uses the functions strcpy() and strdup() that are meant to be used only with zero-terminated character arrays. As we handle two-byte characters, where the most significant byte can be zero (see e.g. letter 'h' with 0x0068), it's possible that the text to be copied to token->text contains zero-bytes mid-string. In that case strdup() doesn't do what it's intended to do. It interprets the buffer as zero-terminated string and only duplicates it up to the first occurence of '\0'. At the same time the original buffer size is stored in token->length. This leads to out-of-bounds memory accesses later on. I attached a simple UTF-16 file that reliably triggers the problem for me (363269_repro.txt) and a proposal for a fix that replaces the string functions with mallocs/memcpys (363269_proposal.patch). Maybe someone with more experience with the cmake parser/lexer could have a look at it. Related questions: The patch fixes the crashes/ASAN aborts for me, but is the cmake parser really handling UTF-16 files correctly? Functions like cmListFileLexer_BOM() imply that it can handle all kinds of UTF formats, but at the same time the whole code seems to imply that it only works on zero terminated char arrays. Do we possibly need to update to a newer version of the lexer (which is generated from external sources, right?)? -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #25 from Axel Kellermann --- Created attachment 106866 --> https://bugs.kde.org/attachment.cgi?id=106866&action=edit Proposal for patch -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #24 from Axel Kellermann --- Created attachment 106865 --> https://bugs.kde.org/attachment.cgi?id=106865&action=edit Minimal UTF-16 file to trigger behavior -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 Kevin Funk changed: What|Removed |Added CC||asiasuppenes...@gmx.de --- Comment #23 from Kevin Funk --- *** Bug 372575 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken, CMakeListsParser::readCMakeFile]
https://bugs.kde.org/show_bug.cgi?id=363269 Kevin Funk changed: What|Removed |Added Summary|Crash when projects |Crash when projects |contains *.txt file that is |contains *.txt file that is |actually a binary file |actually a binary file |[cmListFileLexerSetToken] |[cmListFileLexerSetToken, ||CMakeListsParser::readCMake ||File] -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #22 from Sven Brauch --- Note that while above commit should fix the crash in this case, this bug is still open, since it would be back if you would rename the file to "CMakeLists.txt" ... I think. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #21 from Sven Brauch --- Git commit 77b83054f943b6c9bce0da178732f7992f7ada3b by Sven Brauch. Committed on 13/10/2016 at 18:52. Pushed by brauch into branch '5.0'. Remove mime type <-> extension cache The idea that because one file with extension X has mime type A, determined by its contents, hence other files with extension X will have the same mime type is just wrong. One common example where this breaks in a spectacular way is CMakeLists.txt and the .txt extension. I found the claim that looking into each file will make the application unresponsive to be unfounded. QMimeType will only read the first 16kB to guess the mime type, which takes less than a millisecond for each file. A test project with three hundred 3 MB binary blobs still loads instantly. If, in comparison, we parse one of the files as CMake erraneously, we take multiple seconds. Differential Revision: https://phabricator.kde.org/D3042 M +1-35 shell/languagecontroller.cpp http://commits.kde.org/kdevplatform/77b83054f943b6c9bce0da178732f7992f7ada3b -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 Sven Brauch changed: What|Removed |Added CC||m...@svenbrauch.de --- Comment #20 from Sven Brauch --- Thanks for looking. You are indeed right -- .txt files are treated as cmake in some cases. I also poked around a bit, when it does not find a language for a file it looks at the contents, guesses the mime type, and caches that. So when you open CMakeLists.txt, after that all .txt files will be treated as CMake. That is a bug, and needs to be fixed. The relevant code is, I think, in languagecontroller.cpp:310ff. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #19 from Axel Kellermann --- I had some time to kill today, so I took another look at why random .txt files get processed by the background parser. If I understand the code flow correctly, it happens like this: - on startup the project folder is recursively scanned for all contained files - every found file is wrapped in a ProjectFileItem (abstractfilemanagerplugin.cpp:234) and added to the projects file set - the BackgroundParser then creates a ParseJob for every ProjectFileItem (in BackgroundParser::parseDocumentsInternal()) - in BackgroundParser::createParseJob(), the language to use for the current ProjectFileItem is queried by a call to m_languageController->languagesForUrl(qUrl) - LanguageController::languagesForUrl() in turn checks an internal mimeTypeCache and fileExtensionCache for the language to use for the given file - in my case, the fileExtensionCache contains an entry that resolves the extension txt to language CMake - that means in backgroundparser.cpp:357 I get back a CMakeManager, that is then used to create a cmake parse job Is that expected behaviour or some side effect that has to be fixed? I'm asking this because there are multiple cmake specific spots in the code that explicitly filter for "CMakeLists.txt" and "*.cmake", and that are circumvented by this implicit behaviour. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 Night Nord changed: What|Removed |Added CC||nightn...@gmail.com --- Comment #18 from Night Nord --- The crash still happens on todays (09.09.2016) 5.0 branch with UTF-16 txt files. ASAN trace is the same as the one already attached. Without asan it will crash 100% times either during parsing or when saving the cache, so it pretty much blocks any work with such projects. Strangely enough, I don't see why it's trying to parse random *.txt files (in my case it was 'utf-16.txt' file from reviewboard testdata) - a quick glance at the code says it should only try to parse 'CMakeFiles.txt' or '*.cmake'. -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 --- Comment #17 from Kevin Funk --- Some comments from the dup: """ I found the problem, and it's actually similar, but not the same as bug 363269. The code base being parsed contains a subfolder with example text in a couple of different asian languages, all stored as .txt files in UTF16LE. The cmake parser seems to not just read CMakeLists.txt and *.cmake, but also generally all files with the extension .txt. Apparently the encoding of the example files is handled correctly, but then the wide characters are given to the lexer that expects chars, hence the one byte buffer overflow reported (just my guess, didn't have a look at all the code involved). If I delete all the UTF16 files, the whole code base is processed successfully. """ -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 Kevin Funk changed: What|Removed |Added CC||axel.kellerm...@gmx.de --- Comment #16 from Kevin Funk --- *** Bug 367841 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are watching all bug changes.
[kdevelop] [Bug 363269] Crash when projects contains *.txt file that is actually a binary file [cmListFileLexerSetToken]
https://bugs.kde.org/show_bug.cgi?id=363269 Kevin Funk changed: What|Removed |Added Summary|Crash when projects |Crash when projects |contains *.txt file that is |contains *.txt file that is |actually a binary file |actually a binary file ||[cmListFileLexerSetToken] -- You are receiving this mail because: You are watching all bug changes.