[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791536#comment-17791536 ] Stefan Richter commented on FLINK-32444: [~pnowojski] if there really is an issue with heap backend, then we also need to be careful about what type of caching we can build for RocksDB in the future. > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.19.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789031#comment-17789031 ] Piotr Nowojski commented on FLINK-32444: Instead of checking for the configured state backend, I would add some getter to the statebackend interface like: {code:java} boolean StateBackend#storesObjectReferences(); // false for RocksDB, true for HashMap {code}. > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.19.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783921#comment-17783921 ] Timo Walther commented on FLINK-32444: -- The easiest solution could be to check for the configured state backend? If heap still causes issues? > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.19.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783919#comment-17783919 ] Timo Walther commented on FLINK-32444: -- I was about to simply open a PR for this change, but then I found this comment here: https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/planner/runtime/utils/StreamingWithStateTestBase.scala#L52 {code} enableObjectReuse = state match { case HEAP_BACKEND => false // TODO heap statebackend not support obj reuse now. case ROCKSDB_BACKEND => true } {code} This also matches with my memory why we didn't enable it by default. Does anyone know whether something has changed in the meantime? > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.19.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782658#comment-17782658 ] Piotr Nowojski commented on FLINK-32444: {quote} Does it give us a performance benefits? {quote} Yes. One one job that I've looked into recently, a subtask reading from Kafka, filtering/projecting records and doing local windowed aggregation, with object reused disabled, is spending something between 25%-50% time inside {{CopyingChainingOutput}}. If there are no correctness issues with built-in operators/functions in Flink SQL I would be also giving big +1 for enabling reuse by default. > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.19.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782654#comment-17782654 ] Timo Walther commented on FLINK-32444: -- [~jark] is there a reason why you didn't implement this issue yet? Are there known issues? I guess this would be very low hanging fruit for performance if it causes no issues. > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.19.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737739#comment-17737739 ] Benchao Li commented on FLINK-32444: Big +1 on this, we've enabled it for all production jobs, and get a very good performance improvement. > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.18.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737699#comment-17737699 ] lincoln lee commented on FLINK-32444: - [~jark] Cool! this would be benifitial for sql users > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.18.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32444) Enable object reuse for Flink SQL jobs by default
[ https://issues.apache.org/jira/browse/FLINK-32444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737470#comment-17737470 ] Jark Wu commented on FLINK-32444: - cc [~lincoln.86xy], [~lsy], [~twalthr] what do you think? > Enable object reuse for Flink SQL jobs by default > - > > Key: FLINK-32444 > URL: https://issues.apache.org/jira/browse/FLINK-32444 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.18.0 > > > Currently, object reuse is not enabled by default for Flink Streaming Jobs, > but is enabled by default for Flink Batch jobs. That is not consistent for > stream-batch unification. Besides, SQL operators are safe to enable object > reuse and this is a great performance improvement for SQL jobs. > We should also be careful with the Table-DataStream conversion case > (StreamTableEnvironment) which is not safe to enable object reuse by default. > Maybe we can just enable it for SQL Client/Gateway and TableEnvironment. -- This message was sent by Atlassian Jira (v8.20.10#820010)