[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-02-02 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813740#comment-17813740
 ] 

Piotr Nowojski commented on FLINK-33819:


Thanks [~mayuehappy].

{quote}
merged 4f7725aa into master which is not blocked by the performance result.

I also think it's worthy discussing and we could find a better default 
behavious in the next version.
{quote}

+1


Yes

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-24 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810682#comment-17810682
 ] 

Hangxiang Yu commented on FLINK-33819:
--

merged 4f7725aa into master which is not blocked by the performance result.

[~mayuehappy] Thanks for the test!

What's the state size and memory configuration in your test ? Could we also 
test when state is larger than L2/L3 ?

I also think it's worthy discussing and we could find a better default 
behavious in the next version.

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-22 Thread Yue Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809419#comment-17809419
 ] 

Yue Ma commented on FLINK-33819:


[~pnowojski] [~masteryhx] 

I conducted benchmark tests on Snappy and No_Compression. In the Value State 
test, I increased the number of setupKeyCounts to 1000 because the default 
value state test is too small, most of which may be in the memtable. The 
conclusion is as follows:

Point LookUp operations can improve performance by over *130%+*

Scan type operations can improve performance by *60%*

The performance of write operations has no impact with before

The size of valueState has increased from *78M to 96M* after disabling 
compression

The size of mapState has increased from *58m to 86m* (Perhaps this data is for 
reference only, as the actual compression ratio depends on the characteristics 
of the business data)


|Benchmark|(backendType)|Mode|Score 
(Snappy)|Score(NoCompression)|Units|performance benfit|
|MapStateBenchmark.mapAdd|ROCKSDB|thrpt|654.362|679.276|ops/ms|{color:#FF}3.80737267750878%{color}|
|MapStateBenchmark.mapContains|ROCKSDB|thrpt|104.57|297.213|ops/ms|{color:#FF}184.223964808262%{color}|
|MapStateBenchmark.mapEntries|ROCKSDB|thrpt|573.153|933.967|ops/ms|{color:#FF}62.9524751680616%{color}|
|MapStateBenchmark.mapGet|ROCKSDB|thrpt|106.288|330.821|ops/ms|{color:#FF}211.249623664007%{color}|
|MapStateBenchmark.mapIsEmpty|ROCKSDB|thrpt|88.642|207.76|ops/ms|{color:#FF}134.380993208637%{color}|
|MapStateBenchmark.mapIterator|ROCKSDB|thrpt|572.848|912.097|ops/ms|{color:#FF}59.2214688713236%{color}|
|MapStateBenchmark.mapKeys|ROCKSDB|thrpt|580.244|949.094|ops/ms|{color:#FF}63.568085150385%{color}|
|MapStateBenchmark.mapPutAll|ROCKSDB|thrpt|129.965|130.054|ops/ms|{color:#FF}0.0684799753779853%{color}|
|MapStateBenchmark.mapRemove|ROCKSDB|thrpt|723.835|785.637|ops/ms|{color:#FF}8.53813369068916%{color}|
|MapStateBenchmark.mapUpdate|ROCKSDB|thrpt|697.409|652.893|ops/ms|{color:#FF}-6.38305499355471%{color}|
|MapStateBenchmark.mapValues|ROCKSDB|thrpt|579.399|935.651|ops/ms|{color:#FF}61.4864713263226%{color}|
|ValueStateBenchmark.valueAdd|ROCKSDB|thrpt|645.081|636.098|ops/ms|{color:#FF}-1.39253830139162%{color}|
|ValueStateBenchmark.valueGet|ROCKSDB|thrpt|103.393|297.646|ops/ms|{color:#FF}187.87828963276%{color}|
|ValueStateBenchmark.valueUpdate|ROCKSDB|thrpt|560.153|621.502|ops/ms|{color:#FF}10.9521862776777%{color}|
| | | | | | | |
| | | | | | | |
| |DBSize (SnappyCompression)|DBSize (NoCompression)| | | | |
|ValueStateBenchMark|78M|96M| | | | |
|MapStateBenchMark|58M|86M| | | | |

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-15 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806804#comment-17806804
 ] 

Piotr Nowojski commented on FLINK-33819:


+1 for making this option configurable. 

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-10 Thread Yue Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805354#comment-17805354
 ] 

Yue Ma commented on FLINK-33819:


[~masteryhx]  thanks , I would like to take this and  I'll draft the pr soon

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-10 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805154#comment-17805154
 ] 

Hangxiang Yu commented on FLINK-33819:
--

{quote}It is obvious that closing Compression is not only about benefits, but 
also brings some space amplification. What I want to express is that we may 
need to provide such a configuration for users to balance how to exchange space 
for time
{quote}
Yeah, that's also what we found in the production environment which could 
improve the performance when we use NoCompression for L0 and L1 at the expense 
of space.

So I'm +1 to introduce such a configuration. Of course, we should remain the 
default behavious at the first version.
{quote}After switching the Compression Type, the newly generated file will be 
compressed using the new Compress Type, and the existing file can still be read 
and written with old Compress Type. 
{quote}
Yes, you're right. I have verified this. RocksDB will handle this situation.

 

[~mayuehappy] Would you like to take this ?

 

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-01-09 Thread Yue Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17804999#comment-17804999
 ] 

Yue Ma commented on FLINK-33819:


[~masteryhx]  [~pnowojski] [~srichter]  could you please take a look at this 
ticket ? 

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2023-12-18 Thread Yue Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798083#comment-17798083
 ] 

Yue Ma commented on FLINK-33819:


[~masteryhx] Thanks for replying.
{quote}Linked FLINK-20684 as it has been discussed before.
{quote}
Sorry for missing the previous ticket and creating the duplicated one
{quote}Linked FLINK-11313 which talks about the LZ4 Compression which should be 
more usable than Snappy.
{quote}
We did not use LZ4 in the production environment, and for small state jobs, we 
directly turn off the compression. And for job with large states,  we adopted a 
compression algorithm based on Snappy optimization.
{quote}Do you have some test results on it ?
{quote}
Yes. we did some benchmark test for the *SnappyCompression* and *NoCompression*
And result show that and Read Performance. After turning off Compression, State 
Benchmark read performance can be improved by *80% to 100%*

We also conducted end-to-end online job testing, and after turning off 
Compression, {*}the CPU usage of the job decreased by 16%{*}, while the 
Checkpoint Total Size increased by *4-5 times.*

It is obvious that closing Compression is not only about benefits, but also 
brings some space amplification. What I want to express is that we may need to 
provide such a configuration for users to balance how to exchange space for time
 
{quote}BTW, If we'd like to introduce such a option, it's  better to guarantee 
the compalibility.
{quote}
Sorry, I didn't understand the compatibility issue here. I understand that it 
is compatible here. After switching the Compression Type, the newly generated 
file will be compressed using the new Compress Type, and the existing file can 
still be read and written with old Compress Type. 

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2023-12-14 Thread Hangxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796992#comment-17796992
 ] 

Hangxiang Yu commented on FLINK-33819:
--

Linked FLINK-20684 as it has been discussed before.

Linked FLINK-11313 which talks about the LZ4 Compression which should be more 
usable than Snappy.

IMO, it should improve performane if we disbale the compression of L0/L1 in 
some scenes.

[~mayuehappy] Do you have some test results on it ?

BTW, If we'd like to introduce such a option, it's  better to guarantee the 
compalibility.

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2023-12-14 Thread xiangyu feng (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796977#comment-17796977
 ] 

xiangyu feng commented on FLINK-33819:
--

+1 for this. This will save lots of cpu usage for jobs with less state space 
usage but high state access frequency.

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Priority: Major
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)