Re: suggestion for checking unicode characters against "trojan source attacks"
Le mar. 9 nov. 2021 à 21:01, Jérémy Lal a écrit : > > > Le mar. 9 nov. 2021 à 20:55, Felix Lechner a > écrit : > >> Hi Jérémy, >> >> On Tue, Nov 9, 2021 at 11:48 AM Jérémy Lal wrote: >> > >> > Ok, but the potential targets are source code files, like *.c *.cpp, >> *.js, *.py, *.rb etc... > > >> It was only a stopgap measure. We held a release due to the large >> number of false positives. >> >> Actually only source code files need to be tested. Others like - .po, .pod - .xml, .html, .xhtml, .svg, .md, .txt, - copyright, documentation, plain text can be ignored. I suppose that also *.ini, *.desktop, *.toml could be ignored, but i'm not sure. Maybe for a start, testing only high-level scripts should be done (py, js, rb). > Please just let me know what you would like to see, and I will change >> it again. Have you heard from the security team? > > > No, but as far as i can understand this CVE is difficult to evaluate, > It's a potential threat against source code... that's about it... >
Re: suggestion for checking unicode characters against "trojan source attacks"
Le mar. 9 nov. 2021 à 20:55, Felix Lechner a écrit : > Hi Jérémy, > > On Tue, Nov 9, 2021 at 11:48 AM Jérémy Lal wrote: > > > > Ok, but the potential targets are source code files, like *.c *.cpp, > *.js, *.py, *.rb etc... > > It was only a stopgap measure. We held a release due to the large > number of false positives. > > Please just let me know what you would like to see, and I will change > it again. Have you heard from the security team? > No, but as far as i can understand this CVE is difficult to evaluate, It's a potential threat against source code... that's about it...
Re: suggestion for checking unicode characters against "trojan source attacks"
Hi Jérémy, On Tue, Nov 9, 2021 at 11:48 AM Jérémy Lal wrote: > > Ok, but the potential targets are source code files, like *.c *.cpp, *.js, > *.py, *.rb etc... It was only a stopgap measure. We held a release due to the large number of false positives. Please just let me know what you would like to see, and I will change it again. Have you heard from the security team? Kind regards Felix Lechner
Re: suggestion for checking unicode characters against "trojan source attacks"
Le mar. 9 nov. 2021 à 20:31, Felix Lechner a écrit : > Hi Jérémy, > > On Tue, Nov 9, 2021 at 11:07 AM Jérémy Lal wrote: > > > > the test needs to account for that fact. > > Yeah, I adjusted it to check scripts only. [1] That's any text file > with a hashbang (#!) at the beginning. They do not currently have to > be executable. > Ok, but the potential targets are source code files, like *.c *.cpp, *.js, *.py, *.rb etc... You should see updated results on our website over the next two or three > days. > > > "\u202D39497\u202C" > > That is a LEFT-TO-RIGHT-OVERRIDE. It makes sure the string "39497" is > always printed from left to right—even in Hebrew or Arabic, which are > written in the opposite direction. ack Thanks for having some fun with us! > > Kind regards > Felix Lechner > > P.S. In Lintian, tests are in the test suite. The scanning parts we > call "checks". > > [1] > https://salsa.debian.org/lintian/lintian/-/commit/f0c91bdaf1ffbd6fd4ce1ba1fa78e2a7b2469cc2 >
Re: suggestion for checking unicode characters against "trojan source attacks"
Hi Jérémy, On Tue, Nov 9, 2021 at 11:07 AM Jérémy Lal wrote: > > the test needs to account for that fact. Yeah, I adjusted it to check scripts only. [1] That's any text file with a hashbang (#!) at the beginning. They do not currently have to be executable. You should see updated results on our website over the next two or three days. > "\u202D39497\u202C" That is a LEFT-TO-RIGHT-OVERRIDE. It makes sure the string "39497" is always printed from left to right—even in Hebrew or Arabic, which are written in the opposite direction. Thanks for having some fun with us! Kind regards Felix Lechner P.S. In Lintian, tests are in the test suite. The scanning parts we call "checks". [1] https://salsa.debian.org/lintian/lintian/-/commit/f0c91bdaf1ffbd6fd4ce1ba1fa78e2a7b2469cc2
Re: suggestion for checking unicode characters against "trojan source attacks"
Le ven. 5 nov. 2021 à 15:00, Felix Lechner a écrit : > Dear Jérémy, > > > > grep -r > $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' > > Here are the results from the archive. [1] It's about half-way done. > > Lintian shows which character was encountered, but there are lots of > false positives (all on contents). So far there are no hits on file > names. > > Please help to identify classes of false positives. Otherwise, I have > to turn the tag into a classification (or disable it) which means we > won't see the results on the website. Thanks! > Awesome ! This is really cool. I've started fishing for exploits. Most files indeed are just declaring unicode chars among others, so i suppose the test needs to account for that fact. As an example of an odd case, i don't understand why in https://salsa.debian.org/multimedia-team/intel-media-driver/-/blob/master/media_driver/agnostic/common/os/mos_utilities.cpp#4351 We have those two characters u202D u202C: MOS_DECLARE_UF_KEY_DBGONLY(__MEDIA_USER_FEATURE_VALUE_MOCKADAPTOR_DEVICE_ID, "MockAdaptor Device ID", __MEDIA_USER_FEATURE_SUBKEY_INTERNAL, __MEDIA_USER_FEATURE_SUBKEY_REPORT, "MOS", MOS_USER_FEATURE_TYPE_USER, MOS_USER_FEATURE_VALUE_TYPE_INT32, "\u202D39497\u202C", "Device ID of mock device, default is 0x9A49"), Any suggestion is welcome > Kind regards > Felix Lechner > > [1] https://lintian.debian.org/tags/unicode-trojan
Re: suggestion for checking unicode characters against "trojan source attacks"
Dear Jérémy, > > grep -r > > $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' Here are the results from the archive. [1] It's about half-way done. Lintian shows which character was encountered, but there are lots of false positives (all on contents). So far there are no hits on file names. Please help to identify classes of false positives. Otherwise, I have to turn the tag into a classification (or disable it) which means we won't see the results on the website. Thanks! Kind regards Felix Lechner [1] https://lintian.debian.org/tags/unicode-trojan
Re: suggestion for checking unicode characters against "trojan source attacks"
Dear Jérémy, > grep -r > $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' I implemented this in Perl both for file names, which I think may be more important, and the contents of all files that identify as text via file(1). The tag is called 'unicode-trojan'. [1] You should see the results on the website in about a day. For good measure, we check patched source files as well as files shipped in installation packages. I also added test cases to our test suite. Thanks for the suggestion! Kind regards Felix Lechner [1] https://salsa.debian.org/lintian/lintian/-/commit/d96d2930f17669f0a9509d1a1d319525d8064072
Re: suggestion for checking unicode characters against "trojan source attacks"
Hi Jérémy On Tue, Nov 2, 2021 at 2:47 AM Jérémy Lal wrote: > > using grep looks somewhat simpler. I have not implemented it yet because of concerns about performance, but am not opposed to trying. You may wish to open a bug with Lintian (and perhaps cut & paste all the messages and the valuable hyperlinks from here to there). Kind regards Felix Lechner
Re: suggestion for checking unicode characters against "trojan source attacks"
Le lun. 1 nov. 2021 à 22:51, Jérémy Lal a écrit : > > > Le lun. 1 nov. 2021 à 22:29, Felix Lechner a > écrit : > >> Hi, >> >> On Mon, Nov 1, 2021 at 2:21 PM Jérémy Lal wrote: >> > >> > grep -r >> $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' >> >> Does that cover both conditions? >> > > It seems from the paper at > https://trojansource.codes/trojan-source.pdf > and the list given also at > https://www.unicode.org/reports/tr9/tr9-42.html > that those nine characters are the ones that should be checked. > > There is a risk that it will be slow, by the way—but I generally favor >> doing things right, so no problem here. >> > > Maybe debian security team has already something in mind, or has a better > understanding of this > CVE-2021-42574 and CVE-2021-42694 issue. > Update: the python script i linked at the start of the conversion is now available at https://github.com/siddhesh/find-unicode-control i'm not sure it's worth packaging it - using grep looks somewhat simpler. Jérémy >
Re: suggestion for checking unicode characters against "trojan source attacks"
Hi, On Mon, Nov 1, 2021 at 2:21 PM Jérémy Lal wrote: > > grep -r > $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' Does that cover both conditions? There is a risk that it will be slow, by the way—but I generally favor doing things right, so no problem here. Kind regards Felix
Re: suggestion for checking unicode characters against "trojan source attacks"
Le lun. 1 nov. 2021 à 21:38, Felix Lechner a écrit : > Dear Jérémy, > > On Mon, Nov 1, 2021 at 1:14 PM Jérémy Lal wrote: > > > > it seems this python tool does the job: > > Looks great. If you can package it, your check may be only a few lines > long, or less. I can help with processing the output in Perl. > i think we can skip the python script since lintian wraps the same kind of behavior. BTW it comes from https://access.redhat.com/security/vulnerabilities/RHSB-2021-007#diagnostic-tools and in there we can read this very simple check: grep -r $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'
Re: suggestion for checking unicode characters against "trojan source attacks"
Dear Jérémy, On Mon, Nov 1, 2021 at 1:14 PM Jérémy Lal wrote: > > it seems this python tool does the job: Looks great. If you can package it, your check may be only a few lines long, or less. I can help with processing the output in Perl. Kind regards Felix Lechner
Re: suggestion for checking unicode characters against "trojan source attacks"
Le lun. 1 nov. 2021 à 20:59, Felix Lechner a écrit : > Hi Jérémy, > > On Mon, Nov 1, 2021 at 11:22 AM Jérémy Lal wrote: > > > > the topic is about CVE-2021-42574 and CVE-2021-42694. > > Lintian does not currently look for either condition. I do not have > time to read up in detail on either condition, but would happily help > you write a Lintian check. > > Due to the complexity, it might help to have a third party tool. The > first condition about bidirectional characters seems reasonably > straightforward for sources, in which authors do not usually mix two > languages with opposing directions. The second condition about > homoglyph seems more complex, unless source code instructions are > restricted to ASCII (except for data strings, which may be shown to > users). > > Either way, I am happy to help. Writing checks has never been easier! > Hi, it seems this python tool does the job: https://access.redhat.com/sites/default/files/find_unicode_control2--2021-11-01-1136.zip Jérémy
Re: suggestion for checking unicode characters against "trojan source attacks"
Hi Jérémy, On Mon, Nov 1, 2021 at 11:22 AM Jérémy Lal wrote: > > the topic is about CVE-2021-42574 and CVE-2021-42694. Lintian does not currently look for either condition. I do not have time to read up in detail on either condition, but would happily help you write a Lintian check. Due to the complexity, it might help to have a third party tool. The first condition about bidirectional characters seems reasonably straightforward for sources, in which authors do not usually mix two languages with opposing directions. The second condition about homoglyph seems more complex, unless source code instructions are restricted to ASCII (except for data strings, which may be shown to users). Either way, I am happy to help. Writing checks has never been easier! Kind regards Felix Lechner