[ 
https://issues.apache.org/jira/browse/THRIFT-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Fleischman updated THRIFT-5733:
--------------------------------------
    Description: 
If I try to build the following thrift code, it pretty quickly segfaults:

*setup:*
{code:java}
$ cat foo.thrift
include "bar.thrift"
$ cat bar.thrift
include "foo.thrift"
{code}
*build:*
{code:java}
$ thrift --allow-64bit-consts --gen py:slots foo.thrift
[2] 210654 segmentation fault (core dumped) thrift --allow-64bit-consts --gen 
py:slots foo.thrift{code}
Not very user friendly error message I've ever received ;), but pretty must 
just a cosmetic issue (maybe there's a buffer overflow somewhere and some 
potential security exploit to worry about if you're compiling untrusted thrift 
code, but I personally never do that, so it doesn't stress me out).

However, if you add a 3rd file to the mix, things can get {_}really weird{_}. 
If I try to build the following code, it'll suck up all 32 GiB of RAM on my 
machine and render my computer completely unusable. If you reduce the number of 
entries in {{{}LargeEnum{}}}, you can get the thrift compiler to use a ton of 
RAM before it finally segfaults as in the first example. I've attached a 
screenshot so you can see how RAM and CPU gets used on my machine while 
attempting to build the above code.

*problematic code:*
{code:java}
$ cat foo.thrift
include "bar.thrift"
$ cat bar.thrift
include "large-enum.thrift"
include "foo.thrift"
$ cat large-enum.thrift
enum LargeEnum {
    FOO0 = 0,
    FOO1 = 1,
    ... [FOO2 through FOO1998] ...
    FOO1999 = 1999,
}
{code}
I've also put together a simple repro on 
https://github.com/jfly/2023-09-01-thrift-circular-import|https://github.com/jfly/2023-09-01-thrift-circular-import
 which can autogenerate the 3 files described above. (Just be careful when 
running it that you kill it before it soaks up all of your ram!)

Yesterday, this explosive use of RAM brought our company's build server (with 
128 GiB of RAM!) to its knees. We spent a lot of time flailing around before we 
finally tracked it down to one problematic PR that introduced a circular 
include.

  was:
If I try to build the following thrift code, it pretty quickly segfaults:

*setup:*
{code:java}
$ cat foo.thrift
include "bar.thrift"
$ cat bar.thrift
include "foo.thrift"
{code}
*build:*
{code:java}
$ thrift --allow-64bit-consts --gen py:slots foo.thrift
[2] 210654 segmentation fault (core dumped) thrift --allow-64bit-consts --gen 
py:slots foo.thrift{code}
Not very user friendly error message I've ever received ;), but pretty must 
just a cosmetic issue (maybe there's a buffer overflow somewhere and some 
potential security exploit to worry about if you're compiling untrusted thrift 
code, but I personally never do that, so it doesn't stress me out).

However, if you add a 3rd file to the mix, things can get {_}really weird{_}. 
If I try to build the following code, it'll suck up all 32 GiB of RAM on my 
machine and render my computer completely unusable. If you reduce the number of 
entries in {{{}LargeEnum{}}}, you can get the thrift compiler to use a ton of 
RAM before it finally segfaults as in the first example. I've attached a 
screenshot so you can see how RAM and CPU gets used on my machine while 
attempting to build the above code.

*problematic code:*
{code:java}
$ cat foo.thrift
include "bar.thrift"
$ cat bar.thrift
include "large-enum.thrift"
include "foo.thrift"
$ cat large-enum.thrift
enum LargeEnum {
    FOO0 = 0,
    FOO1 = 1,
    ... [FOO2 through FOO1998] ...
    FOO1999 = 1999,
}
{code}
I've also put together a simple repro on 
[https://github.com/jfly/2023-09-01-thrift-circular-import|https://github.com/jfly/2023-09-01-thrift-circular-import[]
 which can autogenerate the 3 files described above. (Just be careful when 
running it that you kill it before it soaks up all of your ram!)

Yesterday, this explosive use of RAM brought our company's build server (with 
128 GiB of RAM!) to its knees. We spent a lot of time flailing around before we 
finally tracked it down to one problematic PR that introduced a circular 
include.


> Building code with circular `include`s can result in tons of memory usage and 
> eventual segfault
> -----------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-5733
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5733
>             Project: Thrift
>          Issue Type: Bug
>          Components: Compiler (General)
>    Affects Versions: 0.18.1
>         Environment: I'm on Linux, but this also happens to my coworkers on 
> macOS.
>            Reporter: Jeremy Fleischman
>            Assignee: Jens Geyer
>            Priority: Major
>         Attachments: 2023-09-01_16-30-26_pattern.png, testcase.zip
>
>
> If I try to build the following thrift code, it pretty quickly segfaults:
> *setup:*
> {code:java}
> $ cat foo.thrift
> include "bar.thrift"
> $ cat bar.thrift
> include "foo.thrift"
> {code}
> *build:*
> {code:java}
> $ thrift --allow-64bit-consts --gen py:slots foo.thrift
> [2] 210654 segmentation fault (core dumped) thrift --allow-64bit-consts --gen 
> py:slots foo.thrift{code}
> Not very user friendly error message I've ever received ;), but pretty must 
> just a cosmetic issue (maybe there's a buffer overflow somewhere and some 
> potential security exploit to worry about if you're compiling untrusted 
> thrift code, but I personally never do that, so it doesn't stress me out).
> However, if you add a 3rd file to the mix, things can get {_}really weird{_}. 
> If I try to build the following code, it'll suck up all 32 GiB of RAM on my 
> machine and render my computer completely unusable. If you reduce the number 
> of entries in {{{}LargeEnum{}}}, you can get the thrift compiler to use a ton 
> of RAM before it finally segfaults as in the first example. I've attached a 
> screenshot so you can see how RAM and CPU gets used on my machine while 
> attempting to build the above code.
> *problematic code:*
> {code:java}
> $ cat foo.thrift
> include "bar.thrift"
> $ cat bar.thrift
> include "large-enum.thrift"
> include "foo.thrift"
> $ cat large-enum.thrift
> enum LargeEnum {
>     FOO0 = 0,
>     FOO1 = 1,
>     ... [FOO2 through FOO1998] ...
>     FOO1999 = 1999,
> }
> {code}
> I've also put together a simple repro on 
> https://github.com/jfly/2023-09-01-thrift-circular-import|https://github.com/jfly/2023-09-01-thrift-circular-import
>  which can autogenerate the 3 files described above. (Just be careful when 
> running it that you kill it before it soaks up all of your ram!)
> Yesterday, this explosive use of RAM brought our company's build server (with 
> 128 GiB of RAM!) to its knees. We spent a lot of time flailing around before 
> we finally tracked it down to one problematic PR that introduced a circular 
> include.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to