[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613553#comment-16613553 ] ASF GitHub Bot commented on FLINK-10275: TisonKun commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-421020528 Although I don't think remain a stall pull request does harm, since this pull request should be updated nearly as a rework, I would close it. Thanks for your reminding @kl0u! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613554#comment-16613554 ] ASF GitHub Bot commented on FLINK-10275: TisonKun closed pull request #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/dev/execution_configuration.md b/docs/dev/execution_configuration.md index f0103b0f39f..3991ab59ac5 100644 --- a/docs/dev/execution_configuration.md +++ b/docs/dev/execution_configuration.md @@ -59,7 +59,7 @@ With the closure cleaner disabled, it might happen that an anonymous user functi - `enableForceAvro()` / **`disableForceAvro()`**. Avro is not forced by default. Forces the Flink AvroTypeInformation to use the Avro serializer instead of Kryo for serializing Avro POJOs. -- `enableObjectReuse()` / **`disableObjectReuse()`** By default, objects are not reused in Flink. Enabling the object reuse mode will instruct the runtime to reuse user objects for better performance. Keep in mind that this can lead to bugs when the user-code function of an operation is not aware of this behavior. +- `enableObjectReuse()` / **`disableObjectReuse()`** By default, objects are not reused in Flink. Enabling the object reuse mode will instruct the runtime to reuse user objects for better performance. Keep in mind that this can lead to bugs when the user-defined function of an operation is not aware of this behavior. - **`enableSysoutLogging()`** / `disableSysoutLogging()` JobManager status updates are printed to `System.out` by default. This setting allows to disable this behavior. diff --git a/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java b/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java index 59fa803791a..d36fd296562 100644 --- a/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java +++ b/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java @@ -601,8 +601,8 @@ public boolean isForceAvroEnabled() { /** * Enables reusing objects that Flink internally uses for deserialization and passing -* data to user-code functions. Keep in mind that this can lead to bugs when the -* user-code function of an operation is not aware of this behaviour. +* data to user-defined functions. Keep in mind that this can lead to bugs when the +* user-defined function of an operation is not aware of this behaviour. */ public ExecutionConfig enableObjectReuse() { objectReuse = true; @@ -611,7 +611,7 @@ public ExecutionConfig enableObjectReuse() { /** * Disables reusing objects that Flink internally uses for deserialization and passing -* data to user-code functions. @see #enableObjectReuse() +* data to user-defined functions. @see #enableObjectReuse() */ public ExecutionConfig disableObjectReuse() { objectReuse = false; diff --git a/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/batch/table/TableSinkITCase.scala b/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/batch/table/TableSinkITCase.scala index d8ba29ae478..f4185f4c129 100644 --- a/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/batch/table/TableSinkITCase.scala +++ b/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/batch/table/TableSinkITCase.scala @@ -51,7 +51,7 @@ class TableSinkITCase( val input = CollectionDataSets.get3TupleDataSet(env) .map(x => x).setParallelism(4) // increase DOP to 4 -val results = input.toTable(tEnv, 'a, 'b, 'c) +input.toTable(tEnv, 'a, 'b, 'c) .where('a < 5 || 'a > 17) .select('c, 'b) .writeToSink(new CsvTableSink(path, fieldDelim = "|")) diff --git a/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/table/TableSinkITCase.scala b/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/table/TableSinkITCase.scala index 70e59f3d24d..95cb1df1b04 100644 --- a/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/table/TableSinkITCase.scala +++ b/flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/stream/table/TableSinkITCase.scala @@ -21,10 +21,12 @@ package org.apache.flink.table.runtime.stream.table import java.io.File import java.lang.{Boolean => JBool} +import org.apache.flink.api.common.ExecutionConfig import org.apache.flink.api.common.functions.MapFunction im
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613544#comment-16613544 ] ASF GitHub Bot commented on FLINK-10275: kl0u commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-421019224 Hi @TisonKun, In the above discussion, there is consensus that until the related FLIP is agreed upon, there is not going to be any activity on this PR. Could you close this PR so that we keep a clean backlog of PR's? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613150#comment-16613150 ] ASF GitHub Bot commented on FLINK-10275: TisonKun edited a comment on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-420644895 @StephanEwen you're right. After raising the exception mentioned above I communicate with our batch team, who pointing me to FLIP-21. So this thread stalls until the corresponding patch from batch side given out. Still under discussion and attempts. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612681#comment-16612681 ] ASF GitHub Bot commented on FLINK-10275: StephanEwen commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-420782868 Yes, let's resolve FLIP-21. Will try to revive it in the next days... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612121#comment-16612121 ] ASF GitHub Bot commented on FLINK-10275: TisonKun commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-420644895 @StephanEwen you're right. After raising the exception mentioned above I communicate with our batch team, who pointing me to FLIP-21. So this thread stalls until the corresponding patch from batch size given out. Still under discussion and attempts. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611931#comment-16611931 ] ASF GitHub Bot commented on FLINK-10275: StephanEwen commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-420611044 I think we need to find consensus on FLIP-21 before going for this. Currently, object reuse in streaming means to have "no copy on chaining", which is very different from the semantics in the batch API. Simply reusing objects now will break many setups. Before we can merge this, we need to extend the semantics to differentiate between actual object reuse and avoiding copies on handover. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610679#comment-16610679 ] ASF GitHub Bot commented on FLINK-10275: kl0u commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-420289620 Sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610674#comment-16610674 ] ASF GitHub Bot commented on FLINK-10275: TisonKun commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-420289282 @kl0u Thanks for comments! I would clean this PR and emphases the main changes later. Recently working on other threads, ping here once done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602588#comment-16602588 ] ASF GitHub Bot commented on FLINK-10275: TisonKun commented on issue #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643#issuecomment-418236948 This pull request would affect reusing of object from network, according to [FLIP-21](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71012982), it is another reuse aspect from current `ExecutionConfig#isObjectReuseEnable` configuration. Travis fails on `TableSinkITCase#testAppendSinkOnAppendTableForInnerJoin` which is because of mutating inputValue. We need a extra config to switch network aspect object reuse. Still digging... Suggestions are welcome :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10275) StreamTask support object reuse
[ https://issues.apache.org/jira/browse/FLINK-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599068#comment-16599068 ] ASF GitHub Bot commented on FLINK-10275: TisonKun opened a new pull request #6643: [FLINK-10275] StreamTask support object reuse URL: https://github.com/apache/flink/pull/6643 ## What is the purpose of the change StreamTask support efficient object reuse. The purpose behind this is to reduce pressure on the garbage collector. All objects are reused, without backup copies. The operators and UDFs must be careful to not keep any objects as state or not to modify the objects. ## Brief change log - With `ExecutionConfig#isObjectReuseEnable` on, reuse `StreamRecord` associated to `StreamTask`. - Also clean code as glancing over. ## Verifying this change Add case to unit test `OneInputStreamTaskTest.java` and `TwoInputStreamTaskTest.java` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StreamTask support object reuse > --- > > Key: FLINK-10275 > URL: https://issues.apache.org/jira/browse/FLINK-10275 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.7.0 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > StreamTask support efficient object reuse. The purpose behind this is to > reduce pressure on the garbage collector. > All objects are reused, without backup copies. The operators and UDFs must be > careful to not keep any objects as state or not to modify the objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)