Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-09 Thread Jérémy Lal
Le mar. 9 nov. 2021 à 21:01, Jérémy Lal  a écrit :

>
>
> Le mar. 9 nov. 2021 à 20:55, Felix Lechner  a
> écrit :
>
>> Hi Jérémy,
>>
>> On Tue, Nov 9, 2021 at 11:48 AM Jérémy Lal  wrote:
>> >
>> > Ok, but the potential targets are source code files, like *.c *.cpp,
>> *.js, *.py, *.rb etc...
>
>
>> It was only a stopgap measure. We held a release due to the large
>> number of false positives.
>>
>>
Actually only source code files need to be tested.
Others like
- .po, .pod
- .xml, .html, .xhtml, .svg, .md, .txt,
- copyright, documentation, plain text
can be ignored.
I suppose that also *.ini, *.desktop, *.toml could be ignored, but i'm not
sure.
Maybe for a start, testing only high-level scripts should be done (py, js,
rb).



> Please just let me know what you would like to see, and I will change
>> it again. Have you heard from the security team?
>
>
> No, but as far as i can understand this CVE is difficult to evaluate,
> It's a potential threat against source code... that's about it...
>


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-09 Thread Jérémy Lal
Le mar. 9 nov. 2021 à 20:55, Felix Lechner  a
écrit :

> Hi Jérémy,
>
> On Tue, Nov 9, 2021 at 11:48 AM Jérémy Lal  wrote:
> >
> > Ok, but the potential targets are source code files, like *.c *.cpp,
> *.js, *.py, *.rb etc...
>
> It was only a stopgap measure. We held a release due to the large
> number of false positives.
>
> Please just let me know what you would like to see, and I will change
> it again. Have you heard from the security team?
>

No, but as far as i can understand this CVE is difficult to evaluate,
It's a potential threat against source code... that's about it...


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-09 Thread Felix Lechner
Hi Jérémy,

On Tue, Nov 9, 2021 at 11:48 AM Jérémy Lal  wrote:
>
> Ok, but the potential targets are source code files, like *.c *.cpp, *.js, 
> *.py, *.rb etc...

It was only a stopgap measure. We held a release due to the large
number of false positives.

Please just let me know what you would like to see, and I will change
it again. Have you heard from the security team?

Kind regards
Felix Lechner



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-09 Thread Jérémy Lal
Le mar. 9 nov. 2021 à 20:31, Felix Lechner  a
écrit :

> Hi Jérémy,
>
> On Tue, Nov 9, 2021 at 11:07 AM Jérémy Lal  wrote:
> >
> > the test needs to account for that fact.
>
> Yeah, I adjusted it to check scripts only. [1] That's any text file
> with a hashbang (#!) at the beginning. They do not currently have to
> be executable.
>

Ok, but the potential targets are source code files, like *.c *.cpp, *.js,
*.py, *.rb etc...

You should see updated results on our website over the next two or three
> days.
>
> >  "\u202D‭39497\u202C‬"
>
> That is a LEFT-TO-RIGHT-OVERRIDE. It makes sure the string "39497" is
> always printed from left to right—even in Hebrew or Arabic, which are
> written in the opposite direction.


ack

Thanks for having some fun with us!
>

> Kind regards
> Felix Lechner
>
> P.S. In Lintian, tests are in the test suite. The scanning parts we
> call "checks".
>
> [1]
> https://salsa.debian.org/lintian/lintian/-/commit/f0c91bdaf1ffbd6fd4ce1ba1fa78e2a7b2469cc2
>


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-09 Thread Felix Lechner
Hi Jérémy,

On Tue, Nov 9, 2021 at 11:07 AM Jérémy Lal  wrote:
>
> the test needs to account for that fact.

Yeah, I adjusted it to check scripts only. [1] That's any text file
with a hashbang (#!) at the beginning. They do not currently have to
be executable.

You should see updated results on our website over the next two or three days.

>  "\u202D‭39497\u202C‬"

That is a LEFT-TO-RIGHT-OVERRIDE. It makes sure the string "39497" is
always printed from left to right—even in Hebrew or Arabic, which are
written in the opposite direction.

Thanks for having some fun with us!

Kind regards
Felix Lechner

P.S. In Lintian, tests are in the test suite. The scanning parts we
call "checks".

[1] 
https://salsa.debian.org/lintian/lintian/-/commit/f0c91bdaf1ffbd6fd4ce1ba1fa78e2a7b2469cc2



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-09 Thread Jérémy Lal
Le ven. 5 nov. 2021 à 15:00, Felix Lechner  a
écrit :

> Dear Jérémy,
>
> > > grep -r
> $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'
>
> Here are the results from the archive. [1] It's about half-way done.
>
> Lintian shows which character was encountered, but there are lots of
> false positives (all on contents). So far there are no hits on file
> names.
>
> Please help to identify classes of false positives. Otherwise, I have
> to turn the tag into a classification (or disable it) which means we
> won't see the results on the website. Thanks!
>

Awesome ! This is really cool. I've started fishing for exploits.
Most files indeed are just declaring unicode chars among others,
so i suppose the test needs to account for that fact.

As an example of an odd case, i don't understand why in
https://salsa.debian.org/multimedia-team/intel-media-driver/-/blob/master/media_driver/agnostic/common/os/mos_utilities.cpp#4351
We have those two characters u202D u202C:


MOS_DECLARE_UF_KEY_DBGONLY(__MEDIA_USER_FEATURE_VALUE_MOCKADAPTOR_DEVICE_ID,
"MockAdaptor Device ID",
__MEDIA_USER_FEATURE_SUBKEY_INTERNAL,
__MEDIA_USER_FEATURE_SUBKEY_REPORT,
"MOS",
MOS_USER_FEATURE_TYPE_USER,
MOS_USER_FEATURE_VALUE_TYPE_INT32,
"\u202D‭39497\u202C‬",
"Device ID of mock device, default is 0x9A49"),

Any suggestion is welcome


> Kind regards
> Felix Lechner
>
> [1] https://lintian.debian.org/tags/unicode-trojan


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-05 Thread Felix Lechner
Dear Jérémy,

> > grep -r 
> > $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'

Here are the results from the archive. [1] It's about half-way done.

Lintian shows which character was encountered, but there are lots of
false positives (all on contents). So far there are no hits on file
names.

Please help to identify classes of false positives. Otherwise, I have
to turn the tag into a classification (or disable it) which means we
won't see the results on the website. Thanks!

Kind regards
Felix Lechner

[1] https://lintian.debian.org/tags/unicode-trojan



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-04 Thread Felix Lechner
Dear Jérémy,

> grep -r 
> $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'

I implemented this in Perl both for file names, which I think may be
more important, and the contents of all files that identify as text
via file(1). The tag is called 'unicode-trojan'. [1] You should see
the results on the website in about a day.

For good measure, we check patched source files as well as files
shipped in installation packages.

I also added test cases to our test suite.

Thanks for the suggestion!

Kind regards
Felix Lechner

[1] 
https://salsa.debian.org/lintian/lintian/-/commit/d96d2930f17669f0a9509d1a1d319525d8064072



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-03 Thread Felix Lechner
Hi Jérémy

On Tue, Nov 2, 2021 at 2:47 AM Jérémy Lal  wrote:
>
> using grep looks somewhat simpler.

I have not implemented it yet because of concerns about performance,
but am not opposed to trying.

You may wish to open a bug with Lintian (and perhaps cut & paste all
the messages and the valuable hyperlinks from here to there).

Kind regards
Felix Lechner



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-02 Thread Jérémy Lal
Le lun. 1 nov. 2021 à 22:51, Jérémy Lal  a écrit :

>
>
> Le lun. 1 nov. 2021 à 22:29, Felix Lechner  a
> écrit :
>
>> Hi,
>>
>> On Mon, Nov 1, 2021 at 2:21 PM Jérémy Lal  wrote:
>> >
>> > grep -r
>> $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'
>>
>> Does that cover both conditions?
>>
>
> It seems from the paper at
> https://trojansource.codes/trojan-source.pdf
> and the list given also at
> https://www.unicode.org/reports/tr9/tr9-42.html
> that those nine characters are the ones that should be checked.
>
> There is a risk that it will be slow, by the way—but I generally favor
>> doing things right, so no problem here.
>>
>
> Maybe debian security team has already something in mind, or has a better
> understanding of this
> CVE-2021-42574 and CVE-2021-42694 issue.
>

Update: the python script i linked at the start of the conversion is now
available at
https://github.com/siddhesh/find-unicode-control
i'm not sure it's worth packaging it - using grep looks somewhat simpler.

Jérémy

>


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-01 Thread Felix Lechner
Hi,

On Mon, Nov 1, 2021 at 2:21 PM Jérémy Lal  wrote:
>
> grep -r 
> $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'

Does that cover both conditions?

There is a risk that it will be slow, by the way—but I generally favor
doing things right, so no problem here.

Kind regards
Felix



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-01 Thread Jérémy Lal
Le lun. 1 nov. 2021 à 21:38, Felix Lechner  a
écrit :

> Dear Jérémy,
>
> On Mon, Nov 1, 2021 at 1:14 PM Jérémy Lal  wrote:
> >
> > it seems this python tool does the job:
>
> Looks great. If you can package it, your check may be only a few lines
> long, or less. I can help with processing the output in Perl.
>

i think we can skip the python script since lintian wraps the same kind of
behavior.
BTW it comes from
https://access.redhat.com/security/vulnerabilities/RHSB-2021-007#diagnostic-tools
and in there we can read this very simple check:
grep -r
$'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]'


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-01 Thread Felix Lechner
Dear Jérémy,

On Mon, Nov 1, 2021 at 1:14 PM Jérémy Lal  wrote:
>
> it seems this python tool does the job:

Looks great. If you can package it, your check may be only a few lines
long, or less. I can help with processing the output in Perl.

Kind regards
Felix Lechner



Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-01 Thread Jérémy Lal
Le lun. 1 nov. 2021 à 20:59, Felix Lechner  a
écrit :

> Hi Jérémy,
>
> On Mon, Nov 1, 2021 at 11:22 AM Jérémy Lal  wrote:
> >
> > the topic is about CVE-2021-42574 and CVE-2021-42694.
>
> Lintian does not currently look for either condition. I do not have
> time to read up in detail on either condition, but would happily help
> you write a Lintian check.
>
> Due to the complexity, it might help to have a third party tool. The
> first condition about bidirectional characters seems reasonably
> straightforward for sources, in which authors do not usually mix two
> languages with opposing directions. The second condition about
> homoglyph seems more complex, unless source code instructions are
> restricted to ASCII (except for data strings, which may be shown to
> users).
>
> Either way, I am happy to help. Writing checks has never been easier!
>

Hi,
it seems this python tool does the job:
https://access.redhat.com/sites/default/files/find_unicode_control2--2021-11-01-1136.zip

Jérémy


Re: suggestion for checking unicode characters against "trojan source attacks"

2021-11-01 Thread Felix Lechner
Hi Jérémy,

On Mon, Nov 1, 2021 at 11:22 AM Jérémy Lal  wrote:
>
> the topic is about CVE-2021-42574 and CVE-2021-42694.

Lintian does not currently look for either condition. I do not have
time to read up in detail on either condition, but would happily help
you write a Lintian check.

Due to the complexity, it might help to have a third party tool. The
first condition about bidirectional characters seems reasonably
straightforward for sources, in which authors do not usually mix two
languages with opposing directions. The second condition about
homoglyph seems more complex, unless source code instructions are
restricted to ASCII (except for data strings, which may be shown to
users).

Either way, I am happy to help. Writing checks has never been easier!

Kind regards
Felix Lechner