[jira] [Commented] (FLINK-7251) Merge the flink-java8 project into flink-core
[ https://issues.apache.org/jira/browse/FLINK-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548808#comment-16548808 ] ASF GitHub Bot commented on FLINK-7251: --- Github user twalthr commented on a diff in the pull request: https://github.com/apache/flink/pull/6120#discussion_r203605524 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/KeyedStream.java --- @@ -545,15 +543,15 @@ public IntervalJoined( TypeInformation resultType = TypeExtractor.getBinaryOperatorReturnType( cleanedUdf, - ProcessJoinFunction.class,// ProcessJoinFunction - 0, // 012 + ProcessJoinFunction.class, + 0, 1, 2, - TypeExtractor.NO_INDEX, // output arg indices - left.getType(), // input 1 type information - right.getType(),// input 2 type information - INTERVAL_JOIN_FUNC_NAME , - false + TypeExtractor.NO_INDEX, + left.getType(), + right.getType(), + Utils.getCallLocationName(), + true --- End diff -- Just saw that this should be false. Will correct that. > Merge the flink-java8 project into flink-core > - > > Key: FLINK-7251 > URL: https://issues.apache.org/jira/browse/FLINK-7251 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Stephan Ewen >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6120: [FLINK-7251] [types] Remove the flink-java8 module...
Github user twalthr commented on a diff in the pull request: https://github.com/apache/flink/pull/6120#discussion_r203605524 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/KeyedStream.java --- @@ -545,15 +543,15 @@ public IntervalJoined( TypeInformation resultType = TypeExtractor.getBinaryOperatorReturnType( cleanedUdf, - ProcessJoinFunction.class,// ProcessJoinFunction - 0, // 012 + ProcessJoinFunction.class, + 0, 1, 2, - TypeExtractor.NO_INDEX, // output arg indices - left.getType(), // input 1 type information - right.getType(),// input 2 type information - INTERVAL_JOIN_FUNC_NAME , - false + TypeExtractor.NO_INDEX, + left.getType(), + right.getType(), + Utils.getCallLocationName(), + true --- End diff -- Just saw that this should be false. Will correct that. ---
[jira] [Commented] (FLINK-9875) Add concurrent creation of execution job vertex
[ https://issues.apache.org/jira/browse/FLINK-9875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548797#comment-16548797 ] ASF GitHub Bot commented on FLINK-9875: --- Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6353 Fix unstable case, the problem is code below, that may assign `constraints` to a long array and then to a short array, which cause out of index exception. to solve it we could init `constraints` in object construct. https://github.com/apache/flink/blob/056486a1b81e9648a6d3dc795e7e2c6976f8388c/flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java#L87-L91 > Add concurrent creation of execution job vertex > --- > > Key: FLINK-9875 > URL: https://issues.apache.org/jira/browse/FLINK-9875 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Affects Versions: 1.5.1 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > > in some case like inputformat vertex, creation of execution job vertex is time > consuming, this pr add concurrent creation of execution job vertex to > accelerate it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6353: [FLINK-9875] Add concurrent creation of execution job ver...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6353 Fix unstable case, the problem is code below, that may assign `constraints` to a long array and then to a short array, which cause out of index exception. to solve it we could init `constraints` in object construct. https://github.com/apache/flink/blob/056486a1b81e9648a6d3dc795e7e2c6976f8388c/flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/scheduler/CoLocationGroup.java#L87-L91 ---
[jira] [Commented] (FLINK-4534) Lack of synchronization in BucketingSink#restoreState()
[ https://issues.apache.org/jira/browse/FLINK-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548759#comment-16548759 ] zhangminglei commented on FLINK-4534: - Hi, [~gjy] Can we use a lightweight synchronization mechanism to solve this ? For example, use {{volatile}} to void this issue. > Lack of synchronization in BucketingSink#restoreState() > --- > > Key: FLINK-4534 > URL: https://issues.apache.org/jira/browse/FLINK-4534 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > > Iteration over state.bucketStates is protected by synchronization in other > methods, except for the following in restoreState(): > {code} > for (BucketState bucketState : state.bucketStates.values()) { > {code} > and following in close(): > {code} > for (Map.Entry> entry : > state.bucketStates.entrySet()) { > closeCurrentPartFile(entry.getValue()); > {code} > w.r.t. bucketState.pendingFilesPerCheckpoint , there is similar issue > starting line 752: > {code} > Set pastCheckpointIds = > bucketState.pendingFilesPerCheckpoint.keySet(); > LOG.debug("Moving pending files to final location on restore."); > for (Long pastCheckpointId : pastCheckpointIds) { > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9675) Avoid FileInputStream/FileOutputStream
[ https://issues.apache.org/jira/browse/FLINK-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548748#comment-16548748 ] ASF GitHub Bot commented on FLINK-9675: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/6335 Hi, @hequn8128 I will create another jira for ```new InputStreamReader``` refactor issue. What do you think ? > Avoid FileInputStream/FileOutputStream > -- > > Key: FLINK-9675 > URL: https://issues.apache.org/jira/browse/FLINK-9675 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Minor > Labels: filesystem, pull-request-available > > They rely on finalizers (before Java 11), which create unnecessary GC load. > The alternatives, Files.newInputStream, are as easy to use and don't have > this issue. > And here is a benchmark > https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6335: [FLINK-9675] [fs] Avoid FileInputStream/FileOutputStream
Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/6335 Hi, @hequn8128 I will create another jira for ```new InputStreamReader``` refactor issue. What do you think ? ---
[jira] [Commented] (FLINK-9236) Use Apache Parent POM 19
[ https://issues.apache.org/jira/browse/FLINK-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548735#comment-16548735 ] Stephen Jason commented on FLINK-9236: -- Yes, but I can't find [~jiayichao] in the name list. [~fhueske], could you please add him to contributor list. > Use Apache Parent POM 19 > > > Key: FLINK-9236 > URL: https://issues.apache.org/jira/browse/FLINK-9236 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Assignee: Stephen Jason >Priority: Major > > Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. > This will also fix Javadoc generation with JDK 10+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-9236) Use Apache Parent POM 19
[ https://issues.apache.org/jira/browse/FLINK-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Jason reassigned FLINK-9236: Assignee: Stephen Jason > Use Apache Parent POM 19 > > > Key: FLINK-9236 > URL: https://issues.apache.org/jira/browse/FLINK-9236 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Assignee: Stephen Jason >Priority: Major > > Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. > This will also fix Javadoc generation with JDK 10+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6367: [FLINK-9850] Add a string to the print method to i...
Github user hequn8128 commented on a diff in the pull request: https://github.com/apache/flink/pull/6367#discussion_r203588835 --- Diff: flink-streaming-scala/src/main/scala/org/apache/flink/streaming/api/scala/DataStream.scala --- @@ -959,6 +959,29 @@ class DataStream[T](stream: JavaStream[T]) { @PublicEvolving def printToErr() = stream.printToErr() + /** +* Writes a DataStream to the standard output stream (stdout). For each +* element of the DataStream the result of .toString is --- End diff -- .toString => [[AnyRef.toString()]] ---
[jira] [Assigned] (FLINK-9236) Use Apache Parent POM 19
[ https://issues.apache.org/jira/browse/FLINK-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Jason reassigned FLINK-9236: Assignee: (was: Stephen Jason) > Use Apache Parent POM 19 > > > Key: FLINK-9236 > URL: https://issues.apache.org/jira/browse/FLINK-9236 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Priority: Major > > Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. > This will also fix Javadoc generation with JDK 10+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9850) Add a string to the print method to identify output for DataStream
[ https://issues.apache.org/jira/browse/FLINK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548718#comment-16548718 ] ASF GitHub Bot commented on FLINK-9850: --- Github user hequn8128 commented on a diff in the pull request: https://github.com/apache/flink/pull/6367#discussion_r203588835 --- Diff: flink-streaming-scala/src/main/scala/org/apache/flink/streaming/api/scala/DataStream.scala --- @@ -959,6 +959,29 @@ class DataStream[T](stream: JavaStream[T]) { @PublicEvolving def printToErr() = stream.printToErr() + /** +* Writes a DataStream to the standard output stream (stdout). For each +* element of the DataStream the result of .toString is --- End diff -- .toString => [[AnyRef.toString()]] > Add a string to the print method to identify output for DataStream > -- > > Key: FLINK-9850 > URL: https://issues.apache.org/jira/browse/FLINK-9850 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Hequn Cheng >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > > The output of the print method of {[DataSet}} allows the user to supply a > String to identify the output(see > [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But > {[DataStream}} doesn't support now. It is valuable to add this feature for > {{DataStream}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9236) Use Apache Parent POM 19
[ https://issues.apache.org/jira/browse/FLINK-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548715#comment-16548715 ] jiayichao commented on FLINK-9236: -- hi [~Stephen Jason]Stephen Jason, Could you give this issue to me > Use Apache Parent POM 19 > > > Key: FLINK-9236 > URL: https://issues.apache.org/jira/browse/FLINK-9236 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Assignee: Stephen Jason >Priority: Major > > Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. > This will also fix Javadoc generation with JDK 10+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9675) Avoid FileInputStream/FileOutputStream
[ https://issues.apache.org/jira/browse/FLINK-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangminglei updated FLINK-9675: Description: They rely on finalizers (before Java 11), which create unnecessary GC load. The alternatives, Files.newInputStream, are as easy to use and don't have this issue. And here is a benchmark https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html was: They rely on finalizers (before Java 11), which create unnecessary GC load. The alternatives, Files.newInputStream, are as easy to use and don't have this issue. > Avoid FileInputStream/FileOutputStream > -- > > Key: FLINK-9675 > URL: https://issues.apache.org/jira/browse/FLINK-9675 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Minor > Labels: filesystem, pull-request-available > > They rely on finalizers (before Java 11), which create unnecessary GC load. > The alternatives, Files.newInputStream, are as easy to use and don't have > this issue. > And here is a benchmark > https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9675) Avoid FileInputStream/FileOutputStream
[ https://issues.apache.org/jira/browse/FLINK-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548677#comment-16548677 ] ASF GitHub Bot commented on FLINK-9675: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/6335 Thank you very much! @hequn8128 ! I will take a look. > Avoid FileInputStream/FileOutputStream > -- > > Key: FLINK-9675 > URL: https://issues.apache.org/jira/browse/FLINK-9675 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Minor > Labels: filesystem, pull-request-available > > They rely on finalizers (before Java 11), which create unnecessary GC load. > The alternatives, Files.newInputStream, are as easy to use and don't have > this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6335: [FLINK-9675] [fs] Avoid FileInputStream/FileOutputStream
Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/6335 Thank you very much! @hequn8128 ! I will take a look. ---
[jira] [Commented] (FLINK-9735) Potential resource leak in RocksDBStateBackend#getDbOptions
[ https://issues.apache.org/jira/browse/FLINK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548670#comment-16548670 ] vinoyang commented on FLINK-9735: - Will process this issue soon. > Potential resource leak in RocksDBStateBackend#getDbOptions > --- > > Key: FLINK-9735 > URL: https://issues.apache.org/jira/browse/FLINK-9735 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: vinoyang >Priority: Minor > > Here is related code: > {code} > if (optionsFactory != null) { > opt = optionsFactory.createDBOptions(opt); > } > {code} > opt, an DBOptions instance, should be closed before being rewritten. > getColumnOptions has similar issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548654#comment-16548654 ] ASF GitHub Bot commented on FLINK-9849: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/6365 Thanks @yanghua you are right! I just want waiting the travis ending. then give the old and new version's dependency tree. > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1
Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/6365 Thanks @yanghua you are right! I just want waiting the travis ending. then give the old and new version's dependency tree. ---
[jira] [Commented] (FLINK-9061) add entropy to s3 path for better scalability
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548629#comment-16548629 ] ASF GitHub Bot commented on FLINK-9061: --- Github user mdaxini commented on the issue: https://github.com/apache/flink/pull/6302 @indrc In addition to removing the additional dependency, I think there should be a test for validating the randomness of the chose algorithm, and to make sure there are no conflicts. For a Flink Job with large number of Task Managers and state in Terrabytes resulting in several files, there could be a possibility of conflicts with the current random string generation method. fyi @StefanRRichter @StephanEwen > add entropy to s3 path for better scalability > - > > Key: FLINK-9061 > URL: https://issues.apache.org/jira/browse/FLINK-9061 > Project: Flink > Issue Type: Bug > Components: FileSystem, State Backends, Checkpointing >Affects Versions: 1.5.0, 1.4.2 >Reporter: Jamie Grier >Assignee: Indrajit Roychoudhury >Priority: Critical > Labels: pull-request-available > > I think we need to modify the way we write checkpoints to S3 for high-scale > jobs (those with many total tasks). The issue is that we are writing all the > checkpoint data under a common key prefix. This is the worst case scenario > for S3 performance since the key is used as a partition key. > > In the worst case checkpoints fail with a 500 status code coming back from S3 > and an internal error type of TooBusyException. > > One possible solution would be to add a hook in the Flink filesystem code > that allows me to "rewrite" paths. For example say I have the checkpoint > directory set to: > > s3://bucket/flink/checkpoints > > I would hook that and rewrite that path to: > > s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original > path > > This would distribute the checkpoint write load around the S3 cluster evenly. > > For reference: > https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/ > > Any other people hit this issue? Any other ideas for solutions? This is a > pretty serious problem for people trying to checkpoint to S3. > > -Jamie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6302: [FLINK-9061][checkpointing] add entropy to s3 path for be...
Github user mdaxini commented on the issue: https://github.com/apache/flink/pull/6302 @indrc In addition to removing the additional dependency, I think there should be a test for validating the randomness of the chose algorithm, and to make sure there are no conflicts. For a Flink Job with large number of Task Managers and state in Terrabytes resulting in several files, there could be a possibility of conflicts with the current random string generation method. fyi @StefanRRichter @StephanEwen ---
[jira] [Updated] (FLINK-9236) Use Apache Parent POM 19
[ https://issues.apache.org/jira/browse/FLINK-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-9236: -- Description: Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. This will also fix Javadoc generation with JDK 10+ was: Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. This will also fix Javadoc generation with JDK 10+ > Use Apache Parent POM 19 > > > Key: FLINK-9236 > URL: https://issues.apache.org/jira/browse/FLINK-9236 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Assignee: Stephen Jason >Priority: Major > > Flink is still using Apache Parent POM 18. Apache Parent POM 19 is out. > This will also fix Javadoc generation with JDK 10+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-9735) Potential resource leak in RocksDBStateBackend#getDbOptions
[ https://issues.apache.org/jira/browse/FLINK-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540754#comment-16540754 ] Ted Yu edited comment on FLINK-9735 at 7/19/18 12:37 AM: - Short term, we should fix the leaked DBOptions instance by releasing it. was (Author: yuzhih...@gmail.com): Short term, we should fix the leaked DBOptions instance. > Potential resource leak in RocksDBStateBackend#getDbOptions > --- > > Key: FLINK-9735 > URL: https://issues.apache.org/jira/browse/FLINK-9735 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: vinoyang >Priority: Minor > > Here is related code: > {code} > if (optionsFactory != null) { > opt = optionsFactory.createDBOptions(opt); > } > {code} > opt, an DBOptions instance, should be closed before being rewritten. > getColumnOptions has similar issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema
Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys in last commit, I did extend `SchemaCoder` to have `getSchemaId` as you suggested. ---
[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema
[ https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548595#comment-16548595 ] ASF GitHub Bot commented on FLINK-9679: --- Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys in last commit, I did extend `SchemaCoder` to have `getSchemaId` as you suggested. > Implement AvroSerializationSchema > - > > Key: FLINK-9679 > URL: https://issues.apache.org/jira/browse/FLINK-9679 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Major > Labels: pull-request-available > > Implement AvroSerializationSchema using Confluent Schema Registry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6364: [hotfix] typo for SqlExecutionException msg
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6364 Thanks! LGTM cc @zentol ---
[GitHub] flink issue #6360: [FLINK-9884] [runtime] fix slot request may not be remove...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6360 @shuai-xu It makes sense. The message that TM has successfully allocated slot might lost in transport. When slot manager receives a slot status report which says one slot has allocation id irrelevant to this offer, then the slot is allocated to another slot request. It looks this PR prevents runtime from some potential resource leak, doesn't it? ---
[jira] [Commented] (FLINK-9884) Slot request may not be removed when it has already be assigned in slot manager
[ https://issues.apache.org/jira/browse/FLINK-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548559#comment-16548559 ] ASF GitHub Bot commented on FLINK-9884: --- Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6360 @shuai-xu It makes sense. The message that TM has successfully allocated slot might lost in transport. When slot manager receives a slot status report which says one slot has allocation id irrelevant to this offer, then the slot is allocated to another slot request. It looks this PR prevents runtime from some potential resource leak, doesn't it? > Slot request may not be removed when it has already be assigned in slot > manager > --- > > Key: FLINK-9884 > URL: https://issues.apache.org/jira/browse/FLINK-9884 > Project: Flink > Issue Type: Bug > Components: Cluster Management >Reporter: shuai.xu >Assignee: shuai.xu >Priority: Major > Labels: pull-request-available > > When task executor report a slotA with allocationId1, it may happen that slot > manager record slotA is assigned to allocationId2, and the slot request with > allocationId1 is not assigned. Then slot manager will update itself with > slotA assigned to allocationId1, by it does not clear the slot request with > allocationId1. > For example: > # There is one free slot in slot manager. > # Now come two slot request with allocationId1 and allocationId2. > # The slot is assigned to allocationId1, but the requestSlot call timeout. > # SlotManager assign the slot to allocationId2 and insert a slot request > with allocationId1. > # The second requestSlot call to task executor return SlotOccupiedException. > # SlotManager update the slot to allocationID1, but the slot request is left. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema
[ https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548479#comment-16548479 ] ASF GitHub Bot commented on FLINK-9679: --- Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys For second issue I am looking at other Schema registries and trying to extend `SchemaCoder` > Implement AvroSerializationSchema > - > > Key: FLINK-9679 > URL: https://issues.apache.org/jira/browse/FLINK-9679 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Major > Labels: pull-request-available > > Implement AvroSerializationSchema using Confluent Schema Registry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema
Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys For second issue I am looking at other Schema registries and trying to extend `SchemaCoder` ---
[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema
[ https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548425#comment-16548425 ] ASF GitHub Bot commented on FLINK-9679: --- Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys Thanks! As far as I dog on Confluent code, their api needs `subject` to retrieve the Schema Id and version and it should be provided by consumer. https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/SchemaRegistryClient.java#L30 Purpose of new commit is to address your first comments by removing `topic` name in the serialization constructor and replace it with `subject`. So this way serializer doesn't need to know about the `topic` name. If you still see issues with this approach I would appreciate it if you help me to find a better solution. > Implement AvroSerializationSchema > - > > Key: FLINK-9679 > URL: https://issues.apache.org/jira/browse/FLINK-9679 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Major > Labels: pull-request-available > > Implement AvroSerializationSchema using Confluent Schema Registry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema
Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys Thanks! As far as I dog on Confluent code, their api needs `subject` to retrieve the Schema Id and version and it should be provided by consumer. https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/SchemaRegistryClient.java#L30 Purpose of new commit is to address your first comments by removing `topic` name in the serialization constructor and replace it with `subject`. So this way serializer doesn't need to know about the `topic` name. If you still see issues with this approach I would appreciate it if you help me to find a better solution. ---
[jira] [Assigned] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
[ https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyi Chen reassigned FLINK-9891: - Assignee: Shuyi Chen > Flink cluster is not shutdown in YARN mode when Flink client is stopped > --- > > Key: FLINK-9891 > URL: https://issues.apache.org/jira/browse/FLINK-9891 > Project: Flink > Issue Type: Bug >Affects Versions: 1.5.0, 1.5.1 >Reporter: Sergey Krasovskiy >Assignee: Shuyi Chen >Priority: Blocker > > We are not using session mode and detached mode. The command to run Flink job > on YARN is: > {code:java} > /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm > 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount > {code} > Flink CLI logs: > {code:java} > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2018-07-18 12:47:03,747 INFO > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service > address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ > 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - > No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - > No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-07-18 12:47:04,248 WARN > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the > HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink > YARN Client needs one of these to be set to properly load the Hadoop > configuration for accessing YARN. > 2018-07-18 12:47:04,409 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: > ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, > numberTaskManagers=1, slotsPerTaskManager=1} > 2018-07-18 12:47:04,783 WARN > org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit > local reads feature cannot be used because libhadoop cannot be loaded. > 2018-07-18 12:47:04,788 WARN > org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration > directory > ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') > contains both LOG4J and Logback configuration files. Please delete or rename > one of them. > 2018-07-18 12:47:07,846 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application > master application_1531474158783_10814 > 2018-07-18 12:47:08,073 INFO > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application > application_1531474158783_10814 > 2018-07-18 12:47:08,074 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster > to be allocated > 2018-07-18 12:47:08,076 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, > current state ACCEPTED > 2018-07-18 12:47:12,864 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has > been deployed successfully. > {code} > Job Manager logs: > {code:java} > 2018-07-18 12:47:09,913 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > > 2018-07-18 12:47:09,915 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting > YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ > 11:51:27 GMT) > ... > {code} > Issues: > # Flink job is running as a Flink session > # Ctrl+C or 'stop' doesn't stop a job and YARN cluster > # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the > cluster we need to run: yarn application -kill > We also tried to run a flink job with 'mode: legacy' and we have the same > issues: > # Add property 'mode: legacy' to ./conf/flink-conf.yaml > # Execute the following command: > {code:java} > /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm > 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount > {code} > Flink CLI logs: > {code:java} > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because
[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema
[ https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548207#comment-16548207 ] ASF GitHub Bot commented on FLINK-9679: --- Github user dawidwys commented on the issue: https://github.com/apache/flink/pull/6259 @medcv the new commit does not address any of my previous comments or I don't understand something. > Implement AvroSerializationSchema > - > > Key: FLINK-9679 > URL: https://issues.apache.org/jira/browse/FLINK-9679 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Major > Labels: pull-request-available > > Implement AvroSerializationSchema using Confluent Schema Registry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema
Github user dawidwys commented on the issue: https://github.com/apache/flink/pull/6259 @medcv the new commit does not address any of my previous comments or I don't understand something. ---
[jira] [Commented] (FLINK-9679) Implement AvroSerializationSchema
[ https://issues.apache.org/jira/browse/FLINK-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548150#comment-16548150 ] ASF GitHub Bot commented on FLINK-9679: --- Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys I update the PR, please review the usage would be like this ` ConfluentRegistryAvroSerializationSchema.forSpecific(User.class, subject, schemaRegistryUrl)` as Confluent needs "subject" to fetch the Schema info. Now `ConfluentRegistryAvroSerializationSchema` uses "subject" directly instated on `topic + "-value"`. > Implement AvroSerializationSchema > - > > Key: FLINK-9679 > URL: https://issues.apache.org/jira/browse/FLINK-9679 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Yazdan Shirvany >Assignee: Yazdan Shirvany >Priority: Major > Labels: pull-request-available > > Implement AvroSerializationSchema using Confluent Schema Registry -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6259: [FLINK-9679] Implement AvroSerializationSchema
Github user medcv commented on the issue: https://github.com/apache/flink/pull/6259 @dawidwys I update the PR, please review the usage would be like this ` ConfluentRegistryAvroSerializationSchema.forSpecific(User.class, subject, schemaRegistryUrl)` as Confluent needs "subject" to fetch the Schema info. Now `ConfluentRegistryAvroSerializationSchema` uses "subject" directly instated on `topic + "-value"`. ---
[jira] [Created] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
Sergey Krasovskiy created FLINK-9891: Summary: Flink cluster is not shutdown in YARN mode when Flink client is stopped Key: FLINK-9891 URL: https://issues.apache.org/jira/browse/FLINK-9891 Project: Flink Issue Type: Bug Affects Versions: 1.5.1, 1.5.0 Reporter: Sergey Krasovskiy We are not using session mode and detached mode. The command to run flink job on YARN is: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-07-18 12:47:03,747 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,248 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2018-07-18 12:47:04,409 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1, slotsPerTaskManager=1} 2018-07-18 12:47:04,783 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2018-07-18 12:47:04,788 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2018-07-18 12:47:07,846 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1531474158783_10814 2018-07-18 12:47:08,073 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1531474158783_10814 2018-07-18 12:47:08,074 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2018-07-18 12:47:08,076 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2018-07-18 12:47:12,864 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. {code} Job Manager logs: {code:java} 2018-07-18 12:47:09,913 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-07-18 12:47:09,915 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27 GMT) ... {code} Issues: # Flink job is running as a Flink session # Ctrl+C or 'stop' doesn't stop a job and YARN cluster # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the cluster we need to run: yarn application -kill We also tried to run a flink job with 'mode: legacy' and we have the same issues: # Add property 'mode: legacy' to ./conf/flink-conf.yaml # Execute the following command: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See
[jira] [Updated] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
[ https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Krasovskiy updated FLINK-9891: - Description: We are not using session mode and detached mode. The command to run Flink job on YARN is: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-07-18 12:47:03,747 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,248 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2018-07-18 12:47:04,409 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1, slotsPerTaskManager=1} 2018-07-18 12:47:04,783 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2018-07-18 12:47:04,788 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2018-07-18 12:47:07,846 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1531474158783_10814 2018-07-18 12:47:08,073 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1531474158783_10814 2018-07-18 12:47:08,074 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2018-07-18 12:47:08,076 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2018-07-18 12:47:12,864 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. {code} Job Manager logs: {code:java} 2018-07-18 12:47:09,913 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-07-18 12:47:09,915 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27 GMT) ... {code} Issues: # Flink job is running as a Flink session # Ctrl+C or 'stop' doesn't stop a job and YARN cluster # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the cluster we need to run: yarn application -kill We also tried to run a flink job with 'mode: legacy' and we have the same issues: # Add property 'mode: legacy' to ./conf/flink-conf.yaml # Execute the following command: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-07-18 16:07:13,820 INFO
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548029#comment-16548029 ] ASF GitHub Bot commented on FLINK-9849: --- Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203439890 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- I think we should add try / catch block to protect the connect leak. > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548031#comment-16548031 ] ASF GitHub Bot commented on FLINK-9849: --- Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203439859 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- I think we should add try / catch block to protect the connect leak. > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548030#comment-16548030 ] ASF GitHub Bot commented on FLINK-9849: --- Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203440989 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- Based on [HBase Connection JavaDoc](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html#close--) it seems the caller should invoke `close` method to release resource? so I suggest we should close connection in udf's `close` method. > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548027#comment-16548027 ] ASF GitHub Bot commented on FLINK-9849: --- Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203438579 --- Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/TableInputFormat.java --- @@ -81,7 +85,9 @@ private HTable createTable() { org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create(); try { - return new HTable(hConf, getTableName()); + Connection connection = ConnectionFactory.createConnection(hConf); + Table table = connection.getTable(TableName.valueOf(getTableName())); + return (HTable) table; --- End diff -- I think we should release the connection when happens exception > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548028#comment-16548028 ] ASF GitHub Bot commented on FLINK-9849: --- Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203439523 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- I think we should add try / catch block to protect the connect leak. > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1
Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203439890 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- I think we should add try / catch block to protect the connect leak. ---
[GitHub] flink pull request #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1
Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203440989 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- Based on [HBase Connection JavaDoc](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html#close--) it seems the caller should invoke `close` method to release resource? so I suggest we should close connection in udf's `close` method. ---
[GitHub] flink pull request #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1
Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203438579 --- Diff: flink-connectors/flink-hbase/src/main/java/org/apache/flink/addons/hbase/TableInputFormat.java --- @@ -81,7 +85,9 @@ private HTable createTable() { org.apache.hadoop.conf.Configuration hConf = HBaseConfiguration.create(); try { - return new HTable(hConf, getTableName()); + Connection connection = ConnectionFactory.createConnection(hConf); + Table table = connection.getTable(TableName.valueOf(getTableName())); + return (HTable) table; --- End diff -- I think we should release the connection when happens exception ---
[GitHub] flink pull request #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1
Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203439523 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- I think we should add try / catch block to protect the connect leak. ---
[GitHub] flink pull request #6365: [FLINK-9849] [hbase] Hbase upgrade to 2.0.1
Github user yanghua commented on a diff in the pull request: https://github.com/apache/flink/pull/6365#discussion_r203439859 --- Diff: flink-connectors/flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseWriteStreamExample.java --- @@ -87,22 +90,22 @@ public void configure(Configuration parameters) { @Override public void open(int taskNumber, int numTasks) throws IOException { - table = new HTable(conf, "flinkExample"); + Connection connection = ConnectionFactory.createConnection(conf); --- End diff -- I think we should add try / catch block to protect the connect leak. ---
[jira] [Commented] (FLINK-9675) Avoid FileInputStream/FileOutputStream
[ https://issues.apache.org/jira/browse/FLINK-9675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548019#comment-16548019 ] ASF GitHub Bot commented on FLINK-9675: --- Github user hequn8128 commented on the issue: https://github.com/apache/flink/pull/6335 Hi @zhangminglei , Good catch! Maybe the Reader may also need to be adapted, making `new InputStreamReader` to `Channels.newReader`. I find a benchmark about File InputStream and Reader [here](https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html). Hope it helps. > Avoid FileInputStream/FileOutputStream > -- > > Key: FLINK-9675 > URL: https://issues.apache.org/jira/browse/FLINK-9675 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Minor > Labels: filesystem, pull-request-available > > They rely on finalizers (before Java 11), which create unnecessary GC load. > The alternatives, Files.newInputStream, are as easy to use and don't have > this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6335: [FLINK-9675] [fs] Avoid FileInputStream/FileOutputStream
Github user hequn8128 commented on the issue: https://github.com/apache/flink/pull/6335 Hi @zhangminglei , Good catch! Maybe the Reader may also need to be adapted, making `new InputStreamReader` to `Channels.newReader`. I find a benchmark about File InputStream and Reader [here](https://arnaudroger.github.io/blog/2017/03/20/faster-reader-inpustream-in-java.html). Hope it helps. ---
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548014#comment-16548014 ] ASF GitHub Bot commented on FLINK-9878: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203437995 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- it's not showing up as a code block since that only works for markdown; the description so far was plain-text. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl ...
Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203437995 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peerâs hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- it's not showing up as a code block since that only works for markdown; the description so far was plain-text. ---
[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side
[ https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548008#comment-16548008 ] ASF GitHub Bot commented on FLINK-9860: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/6363 +1 > Netty resource leak on receiver side > > > Key: FLINK-9860 > URL: https://issues.apache.org/jira/browse/FLINK-9860 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.1, 1.6.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.5.2, 1.6.0 > > > The Hadoop-free Wordcount end-to-end test fails with the following exception: > {code} > ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector - > LEAK: ByteBuf.release() was not called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) > > org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > {code} > We might have a resource leak on the receiving side of our network stack. > https://api.travis-ci.org/v3/job/404225956/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHandler
Github user zentol commented on the issue: https://github.com/apache/flink/pull/6363 +1 ---
[jira] [Commented] (FLINK-9850) Add a string to the print method to identify output for DataStream
[ https://issues.apache.org/jira/browse/FLINK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548004#comment-16548004 ] ASF GitHub Bot commented on FLINK-9850: --- Github user yanghua commented on the issue: https://github.com/apache/flink/pull/6367 @tillrohrmann and @zentol I see the Python DataStream API methods do not match DataStream Java API methods (missed some API methods), Shall we add those missed API into `PythonDataStream`? If yes, I'd like to do this. > Add a string to the print method to identify output for DataStream > -- > > Key: FLINK-9850 > URL: https://issues.apache.org/jira/browse/FLINK-9850 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Hequn Cheng >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > > The output of the print method of {[DataSet}} allows the user to supply a > String to identify the output(see > [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But > {[DataStream}} doesn't support now. It is valuable to add this feature for > {{DataStream}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6367: [FLINK-9850] Add a string to the print method to identify...
Github user yanghua commented on the issue: https://github.com/apache/flink/pull/6367 @tillrohrmann and @zentol I see the Python DataStream API methods do not match DataStream Java API methods (missed some API methods), Shall we add those missed API into `PythonDataStream`? If yes, I'd like to do this. ---
[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side
[ https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547991#comment-16547991 ] ASF GitHub Bot commented on FLINK-9860: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203432013 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyLeakDetectionResource.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.netty; + +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector; +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetectorFactory; + +import org.junit.rules.ExternalResource; + +import static org.junit.Assert.fail; + +/** + * JUnit resource to fail with an assertion when Netty detects a resource leak (only with + * ERROR logging enabled for + * org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector). + * + * This should be used in a class rule: + * {@code + * @literal @ClassRule + * public static final NettyLeakDetectionResource LEAK_DETECTION = new NettyLeakDetectionResource(); + * } + */ +public class NettyLeakDetectionResource extends ExternalResource { + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @Override + protected void before() { + previousLeakDetector = ResourceLeakDetectorFactory.instance(); + previousLeakDetectorLevel = ResourceLeakDetector.getLevel(); + + ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID); + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(new FailingResourceLeakDetectorFactory()); --- End diff -- true - done > Netty resource leak on receiver side > > > Key: FLINK-9860 > URL: https://issues.apache.org/jira/browse/FLINK-9860 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.1, 1.6.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.5.2, 1.6.0 > > > The Hadoop-free Wordcount end-to-end test fails with the following exception: > {code} > ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector - > LEAK: ByteBuf.release() was not called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) > > org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > >
[GitHub] flink pull request #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHa...
Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203432013 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyLeakDetectionResource.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.netty; + +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector; +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetectorFactory; + +import org.junit.rules.ExternalResource; + +import static org.junit.Assert.fail; + +/** + * JUnit resource to fail with an assertion when Netty detects a resource leak (only with + * ERROR logging enabled for + * org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector). + * + * This should be used in a class rule: + * {@code + * @literal @ClassRule + * public static final NettyLeakDetectionResource LEAK_DETECTION = new NettyLeakDetectionResource(); + * } + */ +public class NettyLeakDetectionResource extends ExternalResource { + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @Override + protected void before() { + previousLeakDetector = ResourceLeakDetectorFactory.instance(); + previousLeakDetectorLevel = ResourceLeakDetector.getLevel(); + + ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID); + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(new FailingResourceLeakDetectorFactory()); --- End diff -- true - done ---
[jira] [Updated] (FLINK-9889) create .bat script to start Flink task manager
[ https://issues.apache.org/jira/browse/FLINK-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated FLINK-9889: Summary: create .bat script to start Flink task manager (was: .bat script to start Flink task manager) > create .bat script to start Flink task manager > -- > > Key: FLINK-9889 > URL: https://issues.apache.org/jira/browse/FLINK-9889 > Project: Flink > Issue Type: Sub-task > Components: Startup Shell Scripts >Reporter: Pavel Shvetsov >Assignee: Pavel Shvetsov >Priority: Minor > Original Estimate: 72h > Remaining Estimate: 72h > > Create .bat script to start additional task managers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side
[ https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547969#comment-16547969 ] ASF GitHub Bot commented on FLINK-9860: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203414700 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyLeakDetectionResource.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.netty; + +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector; +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetectorFactory; + +import org.junit.rules.ExternalResource; + +import static org.junit.Assert.fail; + +/** + * JUnit resource to fail with an assertion when Netty detects a resource leak (only with + * ERROR logging enabled for + * org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector). + * + * This should be used in a class rule: + * {@code + * @literal @ClassRule + * public static final NettyLeakDetectionResource LEAK_DETECTION = new NettyLeakDetectionResource(); + * } + */ +public class NettyLeakDetectionResource extends ExternalResource { + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @Override + protected void before() { + previousLeakDetector = ResourceLeakDetectorFactory.instance(); + previousLeakDetectorLevel = ResourceLeakDetector.getLevel(); + + ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID); + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(new FailingResourceLeakDetectorFactory()); + } + + @Override + protected void after() { + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(previousLeakDetector); + ResourceLeakDetector.setLevel(previousLeakDetectorLevel); + } + + static class FailingResourceLeakDetectorFactory extends ResourceLeakDetectorFactory { --- End diff -- these could be private? > Netty resource leak on receiver side > > > Key: FLINK-9860 > URL: https://issues.apache.org/jira/browse/FLINK-9860 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.1, 1.6.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.5.2, 1.6.0 > > > The Hadoop-free Wordcount end-to-end test fails with the following exception: > {code} > ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector - > LEAK: ByteBuf.release() was not called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) > > org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) > >
[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side
[ https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547968#comment-16547968 ] ASF GitHub Bot commented on FLINK-9860: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203422176 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyLeakDetectionResource.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.netty; + +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector; +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetectorFactory; + +import org.junit.rules.ExternalResource; + +import static org.junit.Assert.fail; + +/** + * JUnit resource to fail with an assertion when Netty detects a resource leak (only with + * ERROR logging enabled for + * org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector). + * + * This should be used in a class rule: + * {@code + * @literal @ClassRule + * public static final NettyLeakDetectionResource LEAK_DETECTION = new NettyLeakDetectionResource(); + * } + */ +public class NettyLeakDetectionResource extends ExternalResource { + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @Override + protected void before() { + previousLeakDetector = ResourceLeakDetectorFactory.instance(); + previousLeakDetectorLevel = ResourceLeakDetector.getLevel(); + + ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID); + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(new FailingResourceLeakDetectorFactory()); --- End diff -- so this isn't something we necessarily have to deal with now, but if multiple tests that use this resource run in parallel we might not reset to the correct factory/level. Effectively what we need is some kinda ref-counting so that only the first resource modifies the level and factory, and only the last resource reset them. :/ > Netty resource leak on receiver side > > > Key: FLINK-9860 > URL: https://issues.apache.org/jira/browse/FLINK-9860 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.1, 1.6.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.5.2, 1.6.0 > > > The Hadoop-free Wordcount end-to-end test fails with the following exception: > {code} > ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector - > LEAK: ByteBuf.release() was not called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) > > org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) > >
[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side
[ https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547967#comment-16547967 ] ASF GitHub Bot commented on FLINK-9860: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203414529 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/rest/FileUploadHandlerTest.java --- @@ -220,4 +224,5 @@ public void testUploadCleanupOnFailure() throws IOException { } MULTIPART_UPLOAD_RESOURCE.assertUploadDirectoryIsEmpty(); } + --- End diff -- revert > Netty resource leak on receiver side > > > Key: FLINK-9860 > URL: https://issues.apache.org/jira/browse/FLINK-9860 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.1, 1.6.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.5.2, 1.6.0 > > > The Hadoop-free Wordcount end-to-end test fails with the following exception: > {code} > ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector - > LEAK: ByteBuf.release() was not called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) > > org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > {code} > We might have a resource leak on the receiving side of our network stack. > https://api.travis-ci.org/v3/job/404225956/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHa...
Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203422176 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyLeakDetectionResource.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.netty; + +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector; +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetectorFactory; + +import org.junit.rules.ExternalResource; + +import static org.junit.Assert.fail; + +/** + * JUnit resource to fail with an assertion when Netty detects a resource leak (only with + * ERROR logging enabled for + * org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector). + * + * This should be used in a class rule: + * {@code + * @literal @ClassRule + * public static final NettyLeakDetectionResource LEAK_DETECTION = new NettyLeakDetectionResource(); + * } + */ +public class NettyLeakDetectionResource extends ExternalResource { + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @Override + protected void before() { + previousLeakDetector = ResourceLeakDetectorFactory.instance(); + previousLeakDetectorLevel = ResourceLeakDetector.getLevel(); + + ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID); + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(new FailingResourceLeakDetectorFactory()); --- End diff -- so this isn't something we necessarily have to deal with now, but if multiple tests that use this resource run in parallel we might not reset to the correct factory/level. Effectively what we need is some kinda ref-counting so that only the first resource modifies the level and factory, and only the last resource reset them. :/ ---
[GitHub] flink pull request #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHa...
Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203414529 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/rest/FileUploadHandlerTest.java --- @@ -220,4 +224,5 @@ public void testUploadCleanupOnFailure() throws IOException { } MULTIPART_UPLOAD_RESOURCE.assertUploadDirectoryIsEmpty(); } + --- End diff -- revert ---
[GitHub] flink pull request #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHa...
Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203414700 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/io/network/netty/NettyLeakDetectionResource.java --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.io.network.netty; + +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector; +import org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetectorFactory; + +import org.junit.rules.ExternalResource; + +import static org.junit.Assert.fail; + +/** + * JUnit resource to fail with an assertion when Netty detects a resource leak (only with + * ERROR logging enabled for + * org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector). + * + * This should be used in a class rule: + * {@code + * @literal @ClassRule + * public static final NettyLeakDetectionResource LEAK_DETECTION = new NettyLeakDetectionResource(); + * } + */ +public class NettyLeakDetectionResource extends ExternalResource { + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @Override + protected void before() { + previousLeakDetector = ResourceLeakDetectorFactory.instance(); + previousLeakDetectorLevel = ResourceLeakDetector.getLevel(); + + ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID); + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(new FailingResourceLeakDetectorFactory()); + } + + @Override + protected void after() { + ResourceLeakDetectorFactory.setResourceLeakDetectorFactory(previousLeakDetector); + ResourceLeakDetector.setLevel(previousLeakDetectorLevel); + } + + static class FailingResourceLeakDetectorFactory extends ResourceLeakDetectorFactory { --- End diff -- these could be private? ---
[jira] [Commented] (FLINK-9850) Add a string to the print method to identify output for DataStream
[ https://issues.apache.org/jira/browse/FLINK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547965#comment-16547965 ] ASF GitHub Bot commented on FLINK-9850: --- GitHub user yanghua opened a pull request: https://github.com/apache/flink/pull/6367 [FLINK-9850] Add a string to the print method to identify output for DataStream ## What is the purpose of the change *This pull request adds a string to the print method to identify output for DataStream* ## Brief change log - *add print(string) / printToErr(string) to DataStream Java API* - *add print(string) / printToErr(string) to DataStream Scala API* - *add print(string) to DataStream Python API* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes** / no) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanghua/flink FLINK-9850 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6367.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6367 commit 80215cd12618392ab0909a431863939d3353ca16 Author: yanghua Date: 2018-07-18T15:20:11Z [FLINK-9850] Add a string to the print method to identify output for DataStream > Add a string to the print method to identify output for DataStream > -- > > Key: FLINK-9850 > URL: https://issues.apache.org/jira/browse/FLINK-9850 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Hequn Cheng >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > > The output of the print method of {[DataSet}} allows the user to supply a > String to identify the output(see > [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But > {[DataStream}} doesn't support now. It is valuable to add this feature for > {{DataStream}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9850) Add a string to the print method to identify output for DataStream
[ https://issues.apache.org/jira/browse/FLINK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-9850: -- Labels: pull-request-available (was: ) > Add a string to the print method to identify output for DataStream > -- > > Key: FLINK-9850 > URL: https://issues.apache.org/jira/browse/FLINK-9850 > Project: Flink > Issue Type: New Feature > Components: DataStream API >Reporter: Hequn Cheng >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > > The output of the print method of {[DataSet}} allows the user to supply a > String to identify the output(see > [FLINK-1486|https://issues.apache.org/jira/browse/FLINK-1486]). But > {[DataStream}} doesn't support now. It is valuable to add this feature for > {{DataStream}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6367: [FLINK-9850] Add a string to the print method to i...
GitHub user yanghua opened a pull request: https://github.com/apache/flink/pull/6367 [FLINK-9850] Add a string to the print method to identify output for DataStream ## What is the purpose of the change *This pull request adds a string to the print method to identify output for DataStream* ## Brief change log - *add print(string) / printToErr(string) to DataStream Java API* - *add print(string) / printToErr(string) to DataStream Scala API* - *add print(string) to DataStream Python API* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes** / no) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanghua/flink FLINK-9850 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6367.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6367 commit 80215cd12618392ab0909a431863939d3353ca16 Author: yanghua Date: 2018-07-18T15:20:11Z [FLINK-9850] Add a string to the print method to identify output for DataStream ---
[jira] [Commented] (FLINK-4534) Lack of synchronization in BucketingSink#restoreState()
[ https://issues.apache.org/jira/browse/FLINK-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547934#comment-16547934 ] Gary Yao commented on FLINK-4534: - [~yuzhih...@gmail.com] Synchronization bears a performance penalty. Synchronization makes the code harder to read. We should avoid unnecessary synchronization by defining clear contracts and threading models. Ideally, every line of code in the Flink repository should be idiomatic because it possibly serves as a role model for future contributions. > Lack of synchronization in BucketingSink#restoreState() > --- > > Key: FLINK-4534 > URL: https://issues.apache.org/jira/browse/FLINK-4534 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > > Iteration over state.bucketStates is protected by synchronization in other > methods, except for the following in restoreState(): > {code} > for (BucketState bucketState : state.bucketStates.values()) { > {code} > and following in close(): > {code} > for (Map.Entry> entry : > state.bucketStates.entrySet()) { > closeCurrentPartFile(entry.getValue()); > {code} > w.r.t. bucketState.pendingFilesPerCheckpoint , there is similar issue > starting line 752: > {code} > Set pastCheckpointIds = > bucketState.pendingFilesPerCheckpoint.keySet(); > LOG.debug("Moving pending files to final location on restore."); > for (Long pastCheckpointId : pastCheckpointIds) { > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9815) YARNSessionCapacitySchedulerITCase flaky
[ https://issues.apache.org/jira/browse/FLINK-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547936#comment-16547936 ] ASF GitHub Bot commented on FLINK-9815: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/6352 yay travis is green. > YARNSessionCapacitySchedulerITCase flaky > > > Key: FLINK-9815 > URL: https://issues.apache.org/jira/browse/FLINK-9815 > Project: Flink > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Dawid Wysakowicz >Assignee: Chesnay Schepler >Priority: Blocker > Labels: pull-request-available > Fix For: 1.6.0 > > > The test fails because of dangling yarn applications. > Logs: https://api.travis-ci.org/v3/job/402657694/log.txt > It was also reported previously in [FLINK-8161] : > https://issues.apache.org/jira/browse/FLINK-8161?focusedCommentId=16480216=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16480216 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6352: [FLINK-9815][yarn][tests] Harden tests against slow job s...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/6352 yay travis is green. ---
[jira] [Commented] (FLINK-9869) Send PartitionInfo in batch to Improve perfornance
[ https://issues.apache.org/jira/browse/FLINK-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547929#comment-16547929 ] ASF GitHub Bot commented on FLINK-9869: --- Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6345 OK. This PR is about performance improvement. I will try to give out a benchmark, but since it is inspired by our own batch table tasks, it might take time to give one. Though since this PR concurrently send partition info and deploy task in another thread, it theoretically does good. Keep on on Flink 1.6! I will nudge you guys to review this one, though(laughed) > Send PartitionInfo in batch to Improve perfornance > -- > > Key: FLINK-9869 > URL: https://issues.apache.org/jira/browse/FLINK-9869 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Affects Versions: 1.5.1 >Reporter: 陈梓立 >Assignee: 陈梓立 >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2 > > > ... current we send partition info as soon as one arrive. we could > `cachePartitionInfo` and then `sendPartitionInfoAsync`, which will improve > performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6345: [FLINK-9869] Send PartitionInfo in batch to Improve perfo...
Github user tison1 commented on the issue: https://github.com/apache/flink/pull/6345 OK. This PR is about performance improvement. I will try to give out a benchmark, but since it is inspired by our own batch table tasks, it might take time to give one. Though since this PR concurrently send partition info and deploy task in another thread, it theoretically does good. Keep on on Flink 1.6! I will nudge you guys to review this one, though(laughed) ---
[jira] [Commented] (FLINK-9886) Build SQL jars with every build
[ https://issues.apache.org/jira/browse/FLINK-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547917#comment-16547917 ] ASF GitHub Bot commented on FLINK-9886: --- GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/6366 [FLINK-9886] [sql-client] Build SQL jars with every build ## What is the purpose of the change This enables the building of the SQL jars by default. This solves a couple of issues: - Reduces user confusion for finding SQL jars in SNAPSHOT releases - Enables end-to-end testing ## Brief change log - Building enabled by defaul but can be skipped with `-DskipSqlJars` ## Verifying this change This change is a trivial rework / code cleanup without any test coverage ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): yes - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not documented You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-9886 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6366.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6366 commit 1fcf364c1b28b29fd9a346ec981b9945507f7f60 Author: Timo Walther Date: 2018-07-18T11:30:28Z [FLINK-9886] [sql-client] Build SQL jars with every build > Build SQL jars with every build > --- > > Key: FLINK-9886 > URL: https://issues.apache.org/jira/browse/FLINK-9886 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > > Currently, the shaded fat jars for SQL are only built in the {{-Prelease}} > profile. However, end-to-end tests require those jars and should also be able > to test them. E.g. existing {{META-INF}} entry and proper shading. We should > build them with every release. If a build should happen quicker one can use > the {{-Pfast}} profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9886) Build SQL jars with every build
[ https://issues.apache.org/jira/browse/FLINK-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547920#comment-16547920 ] ASF GitHub Bot commented on FLINK-9886: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/6366 CC @zentol > Build SQL jars with every build > --- > > Key: FLINK-9886 > URL: https://issues.apache.org/jira/browse/FLINK-9886 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > > Currently, the shaded fat jars for SQL are only built in the {{-Prelease}} > profile. However, end-to-end tests require those jars and should also be able > to test them. E.g. existing {{META-INF}} entry and proper shading. We should > build them with every release. If a build should happen quicker one can use > the {{-Pfast}} profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9886) Build SQL jars with every build
[ https://issues.apache.org/jira/browse/FLINK-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-9886: -- Labels: pull-request-available (was: ) > Build SQL jars with every build > --- > > Key: FLINK-9886 > URL: https://issues.apache.org/jira/browse/FLINK-9886 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Major > Labels: pull-request-available > > Currently, the shaded fat jars for SQL are only built in the {{-Prelease}} > profile. However, end-to-end tests require those jars and should also be able > to test them. E.g. existing {{META-INF}} entry and proper shading. We should > build them with every release. If a build should happen quicker one can use > the {{-Pfast}} profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #6366: [FLINK-9886] [sql-client] Build SQL jars with every build
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/6366 CC @zentol ---
[GitHub] flink pull request #6366: [FLINK-9886] [sql-client] Build SQL jars with ever...
GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/6366 [FLINK-9886] [sql-client] Build SQL jars with every build ## What is the purpose of the change This enables the building of the SQL jars by default. This solves a couple of issues: - Reduces user confusion for finding SQL jars in SNAPSHOT releases - Enables end-to-end testing ## Brief change log - Building enabled by defaul but can be skipped with `-DskipSqlJars` ## Verifying this change This change is a trivial rework / code cleanup without any test coverage ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): yes - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? not documented You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-9886 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6366.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6366 commit 1fcf364c1b28b29fd9a346ec981b9945507f7f60 Author: Timo Walther Date: 2018-07-18T11:30:28Z [FLINK-9886] [sql-client] Build SQL jars with every build ---
[jira] [Commented] (FLINK-9878) IO worker threads BLOCKED on SSL Session Cache while CMS full gc
[ https://issues.apache.org/jira/browse/FLINK-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547909#comment-16547909 ] ASF GitHub Bot commented on FLINK-9878: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203405530 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peer’s hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- could try - strangely though, this is working for e.g. `security.kerberos.login.contexts` although the desired effect (marking it as code) is not there...but that's a different problem. > IO worker threads BLOCKED on SSL Session Cache while CMS full gc > > > Key: FLINK-9878 > URL: https://issues.apache.org/jira/browse/FLINK-9878 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > According to https://github.com/netty/netty/issues/832, there is a JDK issue > during garbage collection when the SSL session cache is not limited. We > should allow the user to configure this and further (advanced) SSL parameters > for fine-tuning to fix this and similar issues. In particular, the following > parameters should be configurable: > - SSL session cache size > - SSL session timeout > - SSL handshake timeout > - SSL close notify flush timeout -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6355: [FLINK-9878][network][ssl] add more low-level ssl ...
Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6355#discussion_r203405530 --- Diff: flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java --- @@ -160,4 +160,41 @@ key("security.ssl.verify-hostname") .defaultValue(true) .withDescription("Flag to enable peerâs hostname verification during ssl handshake."); + + /** +* SSL session cache size. +*/ + public static final ConfigOption SSL_SESSION_CACHE_SIZE = + key("security.ssl.session-cache-size") + .defaultValue(-1) + .withDescription("The size of the cache used for storing SSL session objects. " + + "According to https://github.com/netty/netty/issues/832, you should always set " + + "this to an appropriate number to not run into a bug with stalling IO threads " + + "during garbage collection. (-1 = use system default)."); + + /** +* SSL session timeout. +*/ + public static final ConfigOption SSL_SESSION_TIMEOUT = + key("security.ssl.session-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for the cached SSL session objects. (-1 = use system default)"); + + /** +* SSL session timeout during handshakes. +*/ + public static final ConfigOption SSL_HANDSHAKE_TIMEOUT = + key("security.ssl.handshake-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) during SSL handshake. (-1 = use system default)"); + + /** +* SSL session timeout after flushing the `close_notify` message. +*/ + public static final ConfigOption SSL_CLOSE_NOTIFY_FLUSH_TIMEOUT = + key("security.ssl.close-notify-flush-timeout") + .defaultValue(-1) + .withDescription("The timeout (in ms) for flushing the `close_notify` that was triggered by closing a " + --- End diff -- could try - strangely though, this is working for e.g. `security.kerberos.login.contexts` although the desired effect (marking it as code) is not there...but that's a different problem. ---
[jira] [Resolved] (FLINK-9575) Potential race condition when removing JobGraph in HA
[ https://issues.apache.org/jira/browse/FLINK-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-9575. -- Resolution: Fixed Fixed via master: e984168e2eca59c08da90bd5feeac458eaa91bed f6b2e8c5ff0304e4835d2dc8c792a0d055679603 1.6.0: e2b4ffc016da822dda544b31fb3caf679f80a9d9 b9fe077d221bdb013ed57f2555405c9fe4a96aa1 1.5.2: 1bf77cfe17bc046772d02b22d6347388de359ff6 9c4b40dd0bbb22f8f312b0fc42f54a1a4619bf53 > Potential race condition when removing JobGraph in HA > - > > Key: FLINK-9575 > URL: https://issues.apache.org/jira/browse/FLINK-9575 > Project: Flink > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Dominik Wosiński >Assignee: Dominik Wosiński >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > When we are removing the _JobGraph_ from _JobManager_ for example after > invoking _cancel()_, the following code is executed : > {noformat} > > val futureOption = currentJobs.get(jobID) match { case Some((eg, _)) => val > result = if (removeJobFromStateBackend) { val futureOption = Some(future { > try { // ...otherwise, we can have lingering resources when there is a > concurrent shutdown // and the ZooKeeper client is closed. Not removing the > job immediately allow the // shutdown to release all resources. > submittedJobGraphs.removeJobGraph(jobID) } catch { case t: Throwable => > log.warn(s"Could not remove submitted job graph $jobID.", t) } > }(context.dispatcher)) try { archive ! decorateMessage( > ArchiveExecutionGraph( jobID, ArchivedExecutionGraph.createFrom(eg))) } catch > { case t: Throwable => log.warn(s"Could not archive the execution graph > $eg.", t) } futureOption } else { None } currentJobs.remove(jobID) result > case None => None } // remove all job-related BLOBs from local and HA store > libraryCacheManager.unregisterJob(jobID) blobServer.cleanupJob(jobID, > removeJobFromStateBackend) jobManagerMetricGroup.removeJob(jobID) > futureOption } > val futureOption = currentJobs.get(jobID) match { > case Some((eg, _)) => > val result = if (removeJobFromStateBackend) { > val futureOption = Some(future { > try { > // ...otherwise, we can have lingering resources when there is a concurrent > shutdown > // and the ZooKeeper client is closed. Not removing the job immediately allow > the > // shutdown to release all resources. > submittedJobGraphs.removeJobGraph(jobID) > } catch { > case t: Throwable => log.warn(s"Could not remove submitted job graph > $jobID.", t) > } > }(context.dispatcher)) > try { > archive ! decorateMessage( > ArchiveExecutionGraph( > jobID, > ArchivedExecutionGraph.createFrom(eg))) > } catch { > case t: Throwable => log.warn(s"Could not archive the execution graph $eg.", > t) > } > futureOption > } else { > None > } > currentJobs.remove(jobID) > result > case None => None > } > // remove all job-related BLOBs from local and HA store > libraryCacheManager.unregisterJob(jobID) > blobServer.cleanupJob(jobID, removeJobFromStateBackend) > jobManagerMetricGroup.removeJob(jobID) > futureOption > }{noformat} > This causes the asynchronous removal of the job and synchronous removal of > blob files connected with this jar. This means as far as I understand that > there is a potential problem that we can fail to remove job graph from > _submittedJobGraphs._ If the JobManager fails and we elect the new leader it > can try to recover such job, but it will fail with an exception since the > assigned blob was already removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9860) Netty resource leak on receiver side
[ https://issues.apache.org/jira/browse/FLINK-9860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547890#comment-16547890 ] ASF GitHub Bot commented on FLINK-9860: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203398509 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/rest/FileUploadHandlerTest.java --- @@ -50,6 +55,24 @@ private static final ObjectMapper OBJECT_MAPPER = RestMapperUtils.getStrictObjectMapper(); + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @BeforeClass + public static void setLeakDetector() { --- End diff -- great idea - done > Netty resource leak on receiver side > > > Key: FLINK-9860 > URL: https://issues.apache.org/jira/browse/FLINK-9860 > Project: Flink > Issue Type: Bug > Components: Network >Affects Versions: 1.5.1, 1.6.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.5.2, 1.6.0 > > > The Hadoop-free Wordcount end-to-end test fails with the following exception: > {code} > ERROR org.apache.flink.shaded.netty4.io.netty.util.ResourceLeakDetector - > LEAK: ByteBuf.release() was not called before it's garbage-collected. See > http://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) > > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137) > > org.apache.flink.shaded.netty4.io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) > > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) > > org.apache.flink.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > {code} > We might have a resource leak on the receiving side of our network stack. > https://api.travis-ci.org/v3/job/404225956/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6363: [FLINK-9860][REST] fix buffer leak in FileUploadHa...
Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/6363#discussion_r203398509 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/rest/FileUploadHandlerTest.java --- @@ -50,6 +55,24 @@ private static final ObjectMapper OBJECT_MAPPER = RestMapperUtils.getStrictObjectMapper(); + private static ResourceLeakDetectorFactory previousLeakDetector; + private static ResourceLeakDetector.Level previousLeakDetectorLevel; + + @BeforeClass + public static void setLeakDetector() { --- End diff -- great idea - done ---
[jira] [Created] (FLINK-9890) Remove obsolete Class ResourceManagerConfiguration
Gary Yao created FLINK-9890: --- Summary: Remove obsolete Class ResourceManagerConfiguration Key: FLINK-9890 URL: https://issues.apache.org/jira/browse/FLINK-9890 Project: Flink Issue Type: Improvement Components: Distributed Coordination Affects Versions: 1.6.0 Environment: Rev: 690bc370e19d8003add4e41c05acfb4dccc662b4 Reporter: Gary Yao Assignee: Gary Yao The class {{ResourceManagerConfiguration}} is effectively not used, and should be therefore removed to avoid confusion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9575) Potential race condition when removing JobGraph in HA
[ https://issues.apache.org/jira/browse/FLINK-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547868#comment-16547868 ] ASF GitHub Bot commented on FLINK-9575: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/6322 > Potential race condition when removing JobGraph in HA > - > > Key: FLINK-9575 > URL: https://issues.apache.org/jira/browse/FLINK-9575 > Project: Flink > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Dominik Wosiński >Assignee: Dominik Wosiński >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.2, 1.6.0 > > > When we are removing the _JobGraph_ from _JobManager_ for example after > invoking _cancel()_, the following code is executed : > {noformat} > > val futureOption = currentJobs.get(jobID) match { case Some((eg, _)) => val > result = if (removeJobFromStateBackend) { val futureOption = Some(future { > try { // ...otherwise, we can have lingering resources when there is a > concurrent shutdown // and the ZooKeeper client is closed. Not removing the > job immediately allow the // shutdown to release all resources. > submittedJobGraphs.removeJobGraph(jobID) } catch { case t: Throwable => > log.warn(s"Could not remove submitted job graph $jobID.", t) } > }(context.dispatcher)) try { archive ! decorateMessage( > ArchiveExecutionGraph( jobID, ArchivedExecutionGraph.createFrom(eg))) } catch > { case t: Throwable => log.warn(s"Could not archive the execution graph > $eg.", t) } futureOption } else { None } currentJobs.remove(jobID) result > case None => None } // remove all job-related BLOBs from local and HA store > libraryCacheManager.unregisterJob(jobID) blobServer.cleanupJob(jobID, > removeJobFromStateBackend) jobManagerMetricGroup.removeJob(jobID) > futureOption } > val futureOption = currentJobs.get(jobID) match { > case Some((eg, _)) => > val result = if (removeJobFromStateBackend) { > val futureOption = Some(future { > try { > // ...otherwise, we can have lingering resources when there is a concurrent > shutdown > // and the ZooKeeper client is closed. Not removing the job immediately allow > the > // shutdown to release all resources. > submittedJobGraphs.removeJobGraph(jobID) > } catch { > case t: Throwable => log.warn(s"Could not remove submitted job graph > $jobID.", t) > } > }(context.dispatcher)) > try { > archive ! decorateMessage( > ArchiveExecutionGraph( > jobID, > ArchivedExecutionGraph.createFrom(eg))) > } catch { > case t: Throwable => log.warn(s"Could not archive the execution graph $eg.", > t) > } > futureOption > } else { > None > } > currentJobs.remove(jobID) > result > case None => None > } > // remove all job-related BLOBs from local and HA store > libraryCacheManager.unregisterJob(jobID) > blobServer.cleanupJob(jobID, removeJobFromStateBackend) > jobManagerMetricGroup.removeJob(jobID) > futureOption > }{noformat} > This causes the asynchronous removal of the job and synchronous removal of > blob files connected with this jar. This means as far as I understand that > there is a potential problem that we can fail to remove job graph from > _submittedJobGraphs._ If the JobManager fails and we elect the new leader it > can try to recover such job, but it will fail with an exception since the > assigned blob was already removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6322: [FLINK-9575]: Remove job-related BLOBS only if the...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/6322 ---
[jira] [Updated] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-9849: -- Labels: pull-request-available (was: ) > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9849) Upgrade hbase version to 2.0.1 for hbase connector
[ https://issues.apache.org/jira/browse/FLINK-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547862#comment-16547862 ] ASF GitHub Bot commented on FLINK-9849: --- GitHub user zhangminglei opened a pull request: https://github.com/apache/flink/pull/6365 [FLINK-9849] [hbase] Hbase upgrade ## What is the purpose of the change Upgrade hbase version to 2.0.1 for hbase connector You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhangminglei/flink flink-9849-hbase-upgrade Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6365.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6365 commit cb4bc4b641565e6caf823e85d541df3022a59237 Author: zhangminglei Date: 2018-07-18T13:44:37Z [FLINK-9849] [hbase] Hbase upgrade > Upgrade hbase version to 2.0.1 for hbase connector > -- > > Key: FLINK-9849 > URL: https://issues.apache.org/jira/browse/FLINK-9849 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: zhangminglei >Priority: Major > Labels: pull-request-available > > Currently hbase 1.4.3 is used for hbase connector. > We should upgrade to 2.0.1 which was recently released. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6365: [FLINK-9849] [hbase] Hbase upgrade
GitHub user zhangminglei opened a pull request: https://github.com/apache/flink/pull/6365 [FLINK-9849] [hbase] Hbase upgrade ## What is the purpose of the change Upgrade hbase version to 2.0.1 for hbase connector You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhangminglei/flink flink-9849-hbase-upgrade Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6365.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6365 commit cb4bc4b641565e6caf823e85d541df3022a59237 Author: zhangminglei Date: 2018-07-18T13:44:37Z [FLINK-9849] [hbase] Hbase upgrade ---
[GitHub] flink pull request #6364: [hotfix] typo for SqlExecutionException msg
GitHub user xueyumusic opened a pull request: https://github.com/apache/flink/pull/6364 [hotfix] typo for SqlExecutionException msg fix typo in SqlExecutionException msg in ExecutionContext.java You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/flink hotfix1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6364.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6364 commit a8a4e497acf1e62c1c632724f7bf6604032302fa Author: xueyu <278006819@...> Date: 2018-07-18T13:39:57Z hotfix for SqlExecutionException msg ---
[jira] [Commented] (FLINK-9862) Update end-to-end test to use RocksDB backed timers
[ https://issues.apache.org/jira/browse/FLINK-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547853#comment-16547853 ] ASF GitHub Bot commented on FLINK-9862: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/6351#discussion_r203382388 --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/SemanticsCheckMapper.java --- @@ -56,6 +56,7 @@ public void flatMap(Event event, Collector out) throws Exception { if (validator.check(currentValue, nextValue)) { sequenceValue.update(nextValue); } else { + sequenceValue.update(nextValue); --- End diff -- ``` sequenceValue.update(nextValue); if (!validator.check(currentValue, nextValue)) { out.collect("Alert: " + currentValue + " -> " + nextValue + " (" + event.getKey() + ")"); } ``` > Update end-to-end test to use RocksDB backed timers > --- > > Key: FLINK-9862 > URL: https://issues.apache.org/jira/browse/FLINK-9862 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing, Streaming >Affects Versions: 1.6.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: pull-request-available > Fix For: 1.6.0 > > > We should add or modify an end-to-end test to use RocksDB backed timers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6351: [FLINK-9862] [test] Extend general puropose DataSt...
Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/6351#discussion_r203382388 --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/SemanticsCheckMapper.java --- @@ -56,6 +56,7 @@ public void flatMap(Event event, Collector out) throws Exception { if (validator.check(currentValue, nextValue)) { sequenceValue.update(nextValue); } else { + sequenceValue.update(nextValue); --- End diff -- ``` sequenceValue.update(nextValue); if (!validator.check(currentValue, nextValue)) { out.collect("Alert: " + currentValue + " -> " + nextValue + " (" + event.getKey() + ")"); } ``` ---
[GitHub] flink pull request #6351: [FLINK-9862] [test] Extend general puropose DataSt...
Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/6351#discussion_r203380952 --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/DataStreamAllroundTestJobFactory.java --- @@ -184,12 +189,16 @@ private static final ConfigOption SEQUENCE_GENERATOR_SRC_EVENT_TIME_MAX_OUT_OF_ORDERNESS = ConfigOptions .key("sequence_generator_source.event_time.max_out_of_order") - .defaultValue(500L); + .defaultValue(0L); private static final ConfigOption SEQUENCE_GENERATOR_SRC_EVENT_TIME_CLOCK_PROGRESS = ConfigOptions .key("sequence_generator_source.event_time.clock_progress") .defaultValue(100L); + private static final ConfigOption TUMBLING_WINDOW_OPERATOR_NUM_EVENTS = ConfigOptions + .key("sliding_window_operator.num_events") --- End diff -- `tumbling` instead of `sliding`? ---
[jira] [Commented] (FLINK-9862) Update end-to-end test to use RocksDB backed timers
[ https://issues.apache.org/jira/browse/FLINK-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547848#comment-16547848 ] ASF GitHub Bot commented on FLINK-9862: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/6351#discussion_r203380952 --- Diff: flink-end-to-end-tests/flink-datastream-allround-test/src/main/java/org/apache/flink/streaming/tests/DataStreamAllroundTestJobFactory.java --- @@ -184,12 +189,16 @@ private static final ConfigOption SEQUENCE_GENERATOR_SRC_EVENT_TIME_MAX_OUT_OF_ORDERNESS = ConfigOptions .key("sequence_generator_source.event_time.max_out_of_order") - .defaultValue(500L); + .defaultValue(0L); private static final ConfigOption SEQUENCE_GENERATOR_SRC_EVENT_TIME_CLOCK_PROGRESS = ConfigOptions .key("sequence_generator_source.event_time.clock_progress") .defaultValue(100L); + private static final ConfigOption TUMBLING_WINDOW_OPERATOR_NUM_EVENTS = ConfigOptions + .key("sliding_window_operator.num_events") --- End diff -- `tumbling` instead of `sliding`? > Update end-to-end test to use RocksDB backed timers > --- > > Key: FLINK-9862 > URL: https://issues.apache.org/jira/browse/FLINK-9862 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing, Streaming >Affects Versions: 1.6.0 >Reporter: Till Rohrmann >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: pull-request-available > Fix For: 1.6.0 > > > We should add or modify an end-to-end test to use RocksDB backed timers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9185) Potential null dereference in PrioritizedOperatorSubtaskState#resolvePrioritizedAlternatives
[ https://issues.apache.org/jira/browse/FLINK-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547846#comment-16547846 ] ASF GitHub Bot commented on FLINK-9185: --- Github user StephenJeson commented on a diff in the pull request: https://github.com/apache/flink/pull/5894#discussion_r203378828 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PrioritizedOperatorSubtaskState.java --- @@ -281,10 +281,15 @@ public PrioritizedOperatorSubtaskState build() { // approve-function signaled true. if (alternative != null && alternative.hasState() - && alternative.size() == 1 - && approveFun.apply(reference, alternative.iterator().next())) { --- End diff -- Many thanks for your suggestion, I'll try to refine it. > Potential null dereference in > PrioritizedOperatorSubtaskState#resolvePrioritizedAlternatives > > > Key: FLINK-9185 > URL: https://issues.apache.org/jira/browse/FLINK-9185 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Stephen Jason >Priority: Minor > Labels: pull-request-available > > {code} > if (alternative != null > && alternative.hasState() > && alternative.size() == 1 > && approveFun.apply(reference, alternative.iterator().next())) { > {code} > The return value from approveFun.apply would be unboxed. > We should check that the return value is not null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5894: [FLINK-9185] [runtime] Fix potential null derefere...
Github user StephenJeson commented on a diff in the pull request: https://github.com/apache/flink/pull/5894#discussion_r203378828 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PrioritizedOperatorSubtaskState.java --- @@ -281,10 +281,15 @@ public PrioritizedOperatorSubtaskState build() { // approve-function signaled true. if (alternative != null && alternative.hasState() - && alternative.size() == 1 - && approveFun.apply(reference, alternative.iterator().next())) { --- End diff -- Many thanks for your suggestion, I'll try to refine it. ---
[jira] [Commented] (FLINK-9858) State TTL End-to-End Test
[ https://issues.apache.org/jira/browse/FLINK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547837#comment-16547837 ] ASF GitHub Bot commented on FLINK-9858: --- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/6361#discussion_r203377098 --- Diff: flink-end-to-end-tests/test-scripts/common.sh --- @@ -240,6 +240,15 @@ function start_cluster { done } +function start_taskmanagers { --- End diff -- I think you could just reuse function `tm_watchdog` for this purpose. > State TTL End-to-End Test > - > > Key: FLINK-9858 > URL: https://issues.apache.org/jira/browse/FLINK-9858 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Andrey Zagrebin >Assignee: Andrey Zagrebin >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6361: [FLINK-9858] [tests] State TTL End-to-End Test
Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/6361#discussion_r203377098 --- Diff: flink-end-to-end-tests/test-scripts/common.sh --- @@ -240,6 +240,15 @@ function start_cluster { done } +function start_taskmanagers { --- End diff -- I think you could just reuse function `tm_watchdog` for this purpose. ---
[jira] [Closed] (FLINK-9832) Allow commas in job submission query params
[ https://issues.apache.org/jira/browse/FLINK-9832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-9832. --- Resolution: Won't Fix Fix Version/s: (was: 1.5.2) (was: 1.6.0) > Allow commas in job submission query params > --- > > Key: FLINK-9832 > URL: https://issues.apache.org/jira/browse/FLINK-9832 > Project: Flink > Issue Type: Bug > Components: REST >Affects Versions: 1.5.1 >Reporter: Ufuk Celebi >Assignee: Chesnay Schepler >Priority: Blocker > > As reported on the user mailing list in the thread "Run programs w/ params > including comma via REST api" [1], submitting a job with mainArgs that > include a comma results in an exception. > To reproduce submit a job with the following mainArgs: > {code} > --servers 10.100.98.9:9092,10.100.98.237:9092 > {code} > The request fails with > {code} > Expected only one value [--servers 10.100.98.9:9092, 10.100.98.237:9092]. > {code} > As a work around, users have to use a different delimiter such as {{;}}. > The proper fix of this API would make these params part of the {{POST}} > request instead of relying on query params (as noted in FLINK-9499). I think > it's still valuable to fix this as part of a bug fix release for 1.5. > [1] > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Run-programs-w-params-including-comma-via-REST-api-td19662.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)