[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime
[ https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225734#comment-15225734 ] Konstantin Knauf commented on FLINK-3688: - I will go ahead an open the PR for 4. this evening. > ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is > called and TimeCharacteristic = ProcessingTime > > > Key: FLINK-3688 > URL: https://issues.apache.org/jira/browse/FLINK-3688 > Project: Flink > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Konstantin Knauf >Assignee: Konstantin Knauf >Priority: Critical > > Hi, > when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is > thrown in {{StreamRecordSerializer}} when > {{WindowOperator.processWatermark()}} is called from > {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is > triggered. > The problem seems to be that {{processWatermark()}} is also called in > {{trigger()}}, when time characteristic is ProcessingTime, but in > {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the > {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to > the ClassCastException. Do you agree? > If this is indeed a bug, there several possible solutions. > # Only calling {{processWatermark()}} in {{trigger()}}, when > TimeCharacteristic is EventTime > # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait > for the next watermark to trigger the EventTimeTimers with a timestamp behind > the current watermark. This is, of course, a trade off. > # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no > idea what the side effect of this change would be. I assume there is a reason > for existence of the StreamRecordSerializer ;) > StackTrace: > {quote} > TimerException\{java.lang.RuntimeException: > org.apache.flink.streaming.api.watermark.Watermark cannot be cast to > org.apache.flink.streaming.runtime.streamrecord.StreamRecord\} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: > org.apache.flink.streaming.api.watermark.Watermark cannot be cast to > org.apache.flink.streaming.runtime.streamrecord.StreamRecord > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710) > ... 7 more > Caused by: java.lang.ClassCastException: > org.apache.flink.streaming.api.watermark.Watermark cannot be cast to > org.apache.flink.streaming.runtime.streamrecord.StreamRecord > at > org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42) > at > org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56) > at > org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:79) > at > org.apache.flink.runtime.io.network.api.writer.RecordWriter.broadcastEmit(RecordWriter.java:109) > at > org.apache.flink.streaming.runtime.io.StreamRecordWriter.broadcastEmit(StreamRecordWriter.java:95) > at > org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:90) > ... 11 more > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3541] [Kafka Connector] Clean up workar...
Github user skyahead commented on the pull request: https://github.com/apache/flink/pull/1846#issuecomment-205604300 @StephanEwen I changed Kafka version from 0.9.0.1 to 0.9.0.0 and verified that NullPointerException does get caught, and the code retries connecting for 10 times. Using 0.9.0.1 however, the NullPointerException does not happen anymore, whereas a TimeoutException is thrown as expected and got caught expected too. All test cases in Kafka08ITCase (19 cases) and Kafka09ITCase (15 cases) pass in my local environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-3698) Scala Option can't be used as a key
[ https://issues.apache.org/jira/browse/FLINK-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timur Fayruzov updated FLINK-3698: -- Description: Discussion: http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E Option should be treated the same way as other generic objects where it can be used as a key if generic argument implements Comparable. Here's a scalatest in FunSpec format that illustrates the issue: ``` case class MyKey(x: Option[String]) it("can't use options inside classes used as keys") { val a = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("c" val b = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("z" intercept[InvalidProgramException]{ a.coGroup(b) .where(e => e) .equalTo(e => e) } // workaround a.coGroup(b) .where(e => e.toString) // `e` should be translated to any object that implements Comparable interface to be a valid key. .equalTo(e => e.toString) } ``` was: Discussion: http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E Option should be treated the same way as other generic objects where it can be used as a key if generic argument implements Comparable. > Scala Option can't be used as a key > --- > > Key: FLINK-3698 > URL: https://issues.apache.org/jira/browse/FLINK-3698 > Project: Flink > Issue Type: Bug > Components: Core >Reporter: Timur Fayruzov >Priority: Minor > > Discussion: > http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E > Option should be treated the same way as other generic objects where it can > be used as a key if generic argument implements Comparable. > Here's a scalatest in FunSpec format that illustrates the issue: > ``` > case class MyKey(x: Option[String]) > it("can't use options inside classes used as keys") { > val a = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("c" > val b = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("z" > intercept[InvalidProgramException]{ > a.coGroup(b) > .where(e => e) > .equalTo(e => e) > } > // workaround > a.coGroup(b) > .where(e => e.toString) // `e` should be translated to any object > that implements Comparable interface to be a valid key. > .equalTo(e => e.toString) > } > ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3698) Scala Option can't be used as a key
Timur Fayruzov created FLINK-3698: - Summary: Scala Option can't be used as a key Key: FLINK-3698 URL: https://issues.apache.org/jira/browse/FLINK-3698 Project: Flink Issue Type: Bug Components: Core Reporter: Timur Fayruzov Priority: Minor Discussion: http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E Option should be treated the same way as other generic objects where it can be used as a key if generic argument implements Comparable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3697) keyBy() with nested POJO computes invalid field position indexes
[ https://issues.apache.org/jira/browse/FLINK-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron Crocker updated FLINK-3697: --- Description: Using named keys in keyBy() for nested POJO types results in failure. The iindexes for named key fields are used inconsistently with nested POJO types. In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position after (apparently) flattening the structure but is referenced in the unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}. In the example below, getFlatFields() returns positions 0, 1, and 14. These positions appear correct in the flattened structure of the Data class. However, in {{KeySelectorgetSelectorForKeys(Keys keys, TypeInformation typeInfo, ExecutionConfig executionConfig)}}, a call to {{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results {{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the length of the directly named fields of the object vs the length of flattened version of that type. Concrete Example: Consider this graph: {code} DataStream dataStream = see.addSource(new FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), kafkaConsumerProperties)); dataStream .flatMap(new DataMapper()) .keyBy("aaa", "abc", "wxyz") {code} {{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes this NativeDataFormat object and extracts individual Data objects: {code} public class Data { public int aaa; public int abc; public long wxyz; public int t1; public int t2; public Policy policy; public Stats stats; public Data() {} {code} A {{Policy}} object is an instance of this class: {code} public class Policy { public short a; public short b; public boolean c; public boolean d; public Policy() {} } {code} A {{Stats}} object is an instance of this class: {code} public class Stats { public long count; public float a; public float b; public float c; public float d; public float e; public Stats() {} } {code} was: Using named keys in keyBy() for nested POJO types results in failure. The iindexes for named key fields are used inconsistently with nested POJO types. In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position after (apparently) flattening the structure but is referenced in the unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}. In the example below, getFlatFields() returns positions 0, 1, and 14. These positions appear correct in the flattened structure of the Data class. However, in {{KeySelector getSelectorForKeys(Keys keys, TypeInformation typeInfo, ExecutionConfig executionConfig)}}, a call to {{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results {{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the length of the directly named fields of the object vs the length of flattened version of that type. Concrete Example: Consider this graph: {code} DataStream dataStream = see.addSource(new FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), kafkaConsumerProperties)); dataStream .flatMap(new DataMapper()) .keyBy("aaa", "abc", "wxyz") {code} {{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes this NativeDataFormat object and extracts individual Data objects: {code} public class Data { public int aaa; public int abc; public long wxyz; public int t1; public int t2; public Policy policy; public Stats stats; public Data() {} {code} A {{Policy}} object is an instance of this class: {code} public class AggregatableMetricStoragePolicy implements MetricStoragePolicy { public short a; public short b; public boolean c; public boolean d; public Policy() {} } {code} A {{Stats}} object is an instance of this class: {code} public class Stats { public long count; public float a; public float b; public float c; public float d; public float e; public Stats() {} } {code} > keyBy() with nested POJO computes invalid field position indexes > > > Key: FLINK-3697 > URL: https://issues.apache.org/jira/browse/FLINK-3697 > Project: Flink > Issue Type: Bug > Components: DataStream API >Affects Versions: 1.0.0 > Environment: MacOS X 10.10 >Reporter: Ron Crocker >Priority: Minor > Labels: pojo > > Using named keys in keyBy() for nested POJO types results in failure. The > iindexes for named key fields are used inconsistently with nested POJO types. > In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position > after (apparently) flattening
[jira] [Closed] (FLINK-3623) Adjust MurmurHash algorithm
[ https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-3623. --- > Adjust MurmurHash algorithm > --- > > Key: FLINK-3623 > URL: https://issues.apache.org/jira/browse/FLINK-3623 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.1.0 > > > Flink's MurmurHash implementation differs from the published algorithm. > From Flink's MathUtils.java: > {code} > code *= 0xe6546b64; > {code} > The Murmur3_32 algorithm as described by > [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]: > {code} > m ← 5 > n ← 0xe6546b64 > hash ← hash × m + n > {code} > and in Guava's Murmur3_32HashFunction.java: > {code} > h1 = h1 * 5 + 0xe6546b64; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3623) Adjust MurmurHash algorithm
[ https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-3623. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed via 641a0d436c9b7a34ff33ceb370cf29962cac4dee > Adjust MurmurHash algorithm > --- > > Key: FLINK-3623 > URL: https://issues.apache.org/jira/browse/FLINK-3623 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.1.0 > > > Flink's MurmurHash implementation differs from the published algorithm. > From Flink's MathUtils.java: > {code} > code *= 0xe6546b64; > {code} > The Murmur3_32 algorithm as described by > [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]: > {code} > m ← 5 > n ← 0xe6546b64 > hash ← hash × m + n > {code} > and in Guava's Murmur3_32HashFunction.java: > {code} > h1 = h1 * 5 + 0xe6546b64; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1159) Case style anonymous functions not supported by Scala API
[ https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1159. - Resolution: Fixed Fix Version/s: 1.1.0 Fixed via 5cb84f185963fa89be5d0c4e83bad66bac44d84d Thank you for the contribution! > Case style anonymous functions not supported by Scala API > - > > Key: FLINK-1159 > URL: https://issues.apache.org/jira/browse/FLINK-1159 > Project: Flink > Issue Type: Bug > Components: Scala API >Reporter: Till Rohrmann >Assignee: Stefano Baghino > Fix For: 1.1.0 > > > In Scala it is very common to define anonymous functions of the following form > {code} > { > case foo: Bar => foobar(foo) > case _ => throw new RuntimeException() > } > {code} > These case style anonymous functions are not supported yet by the Scala API. > Thus, one has to write redundant code to name the function parameter. > What works is the following pattern, but it is not intuitive for someone > coming from Scala: > {code} > dataset.map{ > _ match{ > case foo:Bar => ... > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1159) Case style anonymous functions not supported by Scala API
[ https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224884#comment-15224884 ] ASF GitHub Bot commented on FLINK-1159: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1704 > Case style anonymous functions not supported by Scala API > - > > Key: FLINK-1159 > URL: https://issues.apache.org/jira/browse/FLINK-1159 > Project: Flink > Issue Type: Bug > Components: Scala API >Reporter: Till Rohrmann >Assignee: Stefano Baghino > Fix For: 1.1.0 > > > In Scala it is very common to define anonymous functions of the following form > {code} > { > case foo: Bar => foobar(foo) > case _ => throw new RuntimeException() > } > {code} > These case style anonymous functions are not supported yet by the Scala API. > Thus, one has to write redundant code to name the function parameter. > What works is the following pattern, but it is not intuitive for someone > coming from Scala: > {code} > dataset.map{ > _ match{ > case foo:Bar => ... > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3623) Adjust MurmurHash algorithm
[ https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224883#comment-15224883 ] ASF GitHub Bot commented on FLINK-3623: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1806 > Adjust MurmurHash algorithm > --- > > Key: FLINK-3623 > URL: https://issues.apache.org/jira/browse/FLINK-3623 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > > Flink's MurmurHash implementation differs from the published algorithm. > From Flink's MathUtils.java: > {code} > code *= 0xe6546b64; > {code} > The Murmur3_32 algorithm as described by > [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]: > {code} > m ← 5 > n ← 0xe6546b64 > hash ← hash × m + n > {code} > and in Guava's Murmur3_32HashFunction.java: > {code} > h1 = h1 * 5 + 0xe6546b64; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3623] [runtime] Adjust MurmurHash Algor...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1806 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1159] Case style anonymous functions no...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1704 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Updates the AssignerWithPunctuatedWatermarks a...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1811 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission
[ https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-3595. -- Resolution: Fixed Fixed in 3465580 (master), e0dc5c1 (release-1.0). > Kafka09 consumer thread does not interrupt when stuck in record emission > > > Key: FLINK-3595 > URL: https://issues.apache.org/jira/browse/FLINK-3595 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.0.0 >Reporter: Stephan Ewen >Assignee: Ufuk Celebi >Priority: Critical > Fix For: 1.1.0, 1.0.1 > > > When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a > blocking method (output emitting) and never wakes up. > The thread as a whole cannot be simply interrupted, because of a bug in Kafka > that makes the consumer freeze/hang up on interrupt. > There are two possible solutions: > - allow and call interrupt when the consumer thread is emitting elements > - destroy the output network buffer pools eagerly on canceling. The Kafka > thread will then throw an exception if it is stuck in emitting elements and > it will terminate, which is accepted in case the status is canceled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission
[ https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224872#comment-15224872 ] ASF GitHub Bot commented on FLINK-3595: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1780 > Kafka09 consumer thread does not interrupt when stuck in record emission > > > Key: FLINK-3595 > URL: https://issues.apache.org/jira/browse/FLINK-3595 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.0.0 >Reporter: Stephan Ewen >Assignee: Ufuk Celebi >Priority: Critical > Fix For: 1.1.0, 1.0.1 > > > When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a > blocking method (output emitting) and never wakes up. > The thread as a whole cannot be simply interrupted, because of a bug in Kafka > that makes the consumer freeze/hang up on interrupt. > There are two possible solutions: > - allow and call interrupt when the consumer thread is emitting elements > - destroy the output network buffer pools eagerly on canceling. The Kafka > thread will then throw an exception if it is stuck in emitting elements and > it will terminate, which is accepted in case the status is canceled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3595] [runtime] Eagerly destroy buffer ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1780 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3697) keyBy() with nested POJO computes invalid field position indexes
Ron Crocker created FLINK-3697: -- Summary: keyBy() with nested POJO computes invalid field position indexes Key: FLINK-3697 URL: https://issues.apache.org/jira/browse/FLINK-3697 Project: Flink Issue Type: Bug Components: DataStream API Affects Versions: 1.0.0 Environment: MacOS X 10.10 Reporter: Ron Crocker Priority: Minor Using named keys in keyBy() for nested POJO types results in failure. The iindexes for named key fields are used inconsistently with nested POJO types. In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position after (apparently) flattening the structure but is referenced in the unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}. In the example below, getFlatFields() returns positions 0, 1, and 14. These positions appear correct in the flattened structure of the Data class. However, in {{KeySelectorgetSelectorForKeys(Keys keys, TypeInformation typeInfo, ExecutionConfig executionConfig)}}, a call to {{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results {{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the length of the directly named fields of the object vs the length of flattened version of that type. Concrete Example: Consider this graph: {code} DataStream dataStream = see.addSource(new FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), kafkaConsumerProperties)); dataStream .flatMap(new DataMapper()) .keyBy("aaa", "abc", "wxyz") {code} {{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes this NativeDataFormat object and extracts individual Data objects: {code} public class Data { public int aaa; public int abc; public long wxyz; public int t1; public int t2; public Policy policy; public Stats stats; public Data() {} {code} A {{Policy}} object is an instance of this class: {code} public class AggregatableMetricStoragePolicy implements MetricStoragePolicy { public short a; public short b; public boolean c; public boolean d; public Policy() {} } {code} A {{Stats}} object is an instance of this class: {code} public class Stats { public long count; public float a; public float b; public float c; public float d; public float e; public Stats() {} } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment
[ https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224812#comment-15224812 ] ASF GitHub Bot commented on FLINK-2609: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205446161 Let's find a way then :-) > Automatic type registration is only called from the batch execution > environment > --- > > Key: FLINK-2609 > URL: https://issues.apache.org/jira/browse/FLINK-2609 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > Kryo types in the streaming API are quite expensive to serialize because they > are not automatically registered at Kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205446161 Let's find a way then :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input
[ https://issues.apache.org/jira/browse/FLINK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224692#comment-15224692 ] Jamie Grier edited comment on FLINK-3679 at 4/4/16 6:12 PM: I'm not sure about the locking and operator chaining issues so I would say if that's unduly complicated because of this change maybe it's not worth it. However, a DeserializationSchema with more flatMap() like semantics would certainly be the better API given that bad data issues are a reality. It also seems we could provide this without breaking existing code, but certainly it would add a bit more complexity to the API (having multiple variants for this). Anyway, I agree you can work around this issue my making a special "sentinel" value and dealing with all of this is in a chained flatMap() operator. I imagine that's exactly the approach that people are already using. was (Author: jgrier): I'm not sure about the locking and operator chaining issues so I would say if that's unduly complicated because of this change maybe it's not worth it. However, a DeserializationSchema with more flatMap() like semantics would certainly the better API given that bad data issues are a reality. It also seems we could provide this without breaking existing code, but certainly it would add a bit more complexity to the API (having multiple variants for this). Anyway, I agree you can work around this issue my making a special "sentinel" value and dealing with all of this is in a chained flatMap() operator. I imagine that's exactly the approach that people are already using. > DeserializationSchema should handle zero or more outputs for every input > > > Key: FLINK-3679 > URL: https://issues.apache.org/jira/browse/FLINK-3679 > Project: Flink > Issue Type: Bug > Components: DataStream API >Reporter: Jamie Grier > > There are a couple of issues with the DeserializationSchema API that I think > should be improved. This request has come to me via an existing Flink user. > The main issue is simply that the API assumes that there is a one-to-one > mapping between input and outputs. In reality there are scenarios where one > input message (say from Kafka) might actually map to zero or more logical > elements in the pipeline. > Particularly important here is the case where you receive a message from a > source (such as Kafka) and say the raw bytes don't deserialize properly. > Right now the only recourse is to throw IOException and therefore fail the > job. > This is definitely not good since bad data is a reality and failing the job > is not the right option. If the job fails we'll just end up replaying the > bad data and the whole thing will start again. > Instead in this case it would be best if the user could just return the empty > set. > The other case is where one input message should logically be multiple output > messages. This case is probably less important since there are other ways to > do this but in general it might be good to make the > DeserializationSchema.deserialize() method return a collection rather than a > single element. > Maybe we need to support a DeserializationSchema variant that has semantics > more like that of FlatMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input
[ https://issues.apache.org/jira/browse/FLINK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224692#comment-15224692 ] Jamie Grier commented on FLINK-3679: I'm not sure about the locking and operator chaining issues so I would say if that's unduly complicated because of this change maybe it's not worth it. However, a DeserializationSchema with more flatMap() like semantics would certainly the better API given that bad data issues are a reality. It also seems we could provide this without breaking existing code, but certainly it would add a bit more complexity to the API (having multiple variants for this). Anyway, I agree you can work around this issue my making a special "sentinel" value and dealing with all of this is in a chained flatMap() operator. I imagine that's exactly the approach that people are already using. > DeserializationSchema should handle zero or more outputs for every input > > > Key: FLINK-3679 > URL: https://issues.apache.org/jira/browse/FLINK-3679 > Project: Flink > Issue Type: Bug > Components: DataStream API >Reporter: Jamie Grier > > There are a couple of issues with the DeserializationSchema API that I think > should be improved. This request has come to me via an existing Flink user. > The main issue is simply that the API assumes that there is a one-to-one > mapping between input and outputs. In reality there are scenarios where one > input message (say from Kafka) might actually map to zero or more logical > elements in the pipeline. > Particularly important here is the case where you receive a message from a > source (such as Kafka) and say the raw bytes don't deserialize properly. > Right now the only recourse is to throw IOException and therefore fail the > job. > This is definitely not good since bad data is a reality and failing the job > is not the right option. If the job fails we'll just end up replaying the > bad data and the whole thing will start again. > Instead in this case it would be best if the user could just return the empty > set. > The other case is where one input message should logically be multiple output > messages. This case is probably less important since there are other ways to > do this but in general it might be good to make the > DeserializationSchema.deserialize() method return a collection rather than a > single element. > Maybe we need to support a DeserializationSchema variant that has semantics > more like that of FlatMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector
[ https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224633#comment-15224633 ] Farouk Salem commented on FLINK-3343: - I use the default values. > Exception while using Kafka 0.9 connector > -- > > Key: FLINK-3343 > URL: https://issues.apache.org/jira/browse/FLINK-3343 > Project: Flink > Issue Type: Improvement > Components: flink-contrib, Kafka Connector >Affects Versions: 1.0.0 >Reporter: Farouk Salem > > While running a job, without fault tolerance, producing data to Kafka, the > job failed due to "Batch Expired exception". I tried to increase the > "request.timeout.ms" and "max.block.ms" to 6 instead of 3 but still > the same problem. The only way to ride on this problem is using snapshotting. > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48106 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48105 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48104 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask >- Caught exception while processing timer. > java.lang.RuntimeException: Could not forward element to next operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) > at > org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Could not forward element to next > operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316) > ... 15 more > Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:249) > at >
[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector
[ https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224620#comment-15224620 ] Robert Metzger commented on FLINK-3343: --- How much heap size do you allocate per task manager? > Exception while using Kafka 0.9 connector > -- > > Key: FLINK-3343 > URL: https://issues.apache.org/jira/browse/FLINK-3343 > Project: Flink > Issue Type: Improvement > Components: flink-contrib, Kafka Connector >Affects Versions: 1.0.0 >Reporter: Farouk Salem > > While running a job, without fault tolerance, producing data to Kafka, the > job failed due to "Batch Expired exception". I tried to increase the > "request.timeout.ms" and "max.block.ms" to 6 instead of 3 but still > the same problem. The only way to ride on this problem is using snapshotting. > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48106 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48105 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48104 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask >- Caught exception while processing timer. > java.lang.RuntimeException: Could not forward element to next operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) > at > org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Could not forward element to next > operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316) > ... 15 more > Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:249) > at >
[jira] [Commented] (FLINK-2909) Gelly Graph Generators
[ https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224605#comment-15224605 ] ASF GitHub Bot commented on FLINK-2909: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1807#issuecomment-205412831 Do you prefer the look of hand-crafted graphs or should we consider something that will draw graphs in the browser like [JSNetworkX](http://jsnetworkx.org/index.html)? This is BSD licensed. They have yet to fully implement NetworkX but these graph generators map to generators in NetworkX. > Gelly Graph Generators > -- > > Key: FLINK-2909 > URL: https://issues.apache.org/jira/browse/FLINK-2909 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > Include a selection of graph generators in Gelly. Generated graphs will be > useful for performing scalability, stress, and regression testing as well as > benchmarking and comparing algorithms, for both Flink users and developers. > Generated data is infinitely scalable yet described by a few simple > parameters and can often substitute for user data or sharing large files when > reporting issues. > There are at multiple categories of graphs as documented by > [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html] > and elsewhere. > Graphs may be a well-defined, i.e. the [Chvátal > graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be > sufficiently small to populate locally. > Graphs may be scalable, i.e. complete and star graphs. These should use > Flink's distributed parallelism. > Graphs may be stochastic, i.e. [RMat > graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] > . A key consideration is that the graphs should source randomness from a > seedable PRNG and generate the same Graph regardless of parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector
[ https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224606#comment-15224606 ] Farouk Salem commented on FLINK-3343: - yeah, it works most of the time. But, sometimes it throws "GC overhead limit exceeded" exception. > Exception while using Kafka 0.9 connector > -- > > Key: FLINK-3343 > URL: https://issues.apache.org/jira/browse/FLINK-3343 > Project: Flink > Issue Type: Improvement > Components: flink-contrib, Kafka Connector >Affects Versions: 1.0.0 >Reporter: Farouk Salem > > While running a job, without fault tolerance, producing data to Kafka, the > job failed due to "Batch Expired exception". I tried to increase the > "request.timeout.ms" and "max.block.ms" to 6 instead of 3 but still > the same problem. The only way to ride on this problem is using snapshotting. > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48106 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48105 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48104 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask >- Caught exception while processing timer. > java.lang.RuntimeException: Could not forward element to next operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) > at > org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Could not forward element to next > operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316) > ... 15 more > Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282) > at >
[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1807#issuecomment-205412831 Do you prefer the look of hand-crafted graphs or should we consider something that will draw graphs in the browser like [JSNetworkX](http://jsnetworkx.org/index.html)? This is BSD licensed. They have yet to fully implement NetworkX but these graph generators map to generators in NetworkX. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector
[ https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224598#comment-15224598 ] Robert Metzger commented on FLINK-3343: --- Have you tried setting a batch size of 0 ? > Exception while using Kafka 0.9 connector > -- > > Key: FLINK-3343 > URL: https://issues.apache.org/jira/browse/FLINK-3343 > Project: Flink > Issue Type: Improvement > Components: flink-contrib, Kafka Connector >Affects Versions: 1.0.0 >Reporter: Farouk Salem > > While running a job, without fault tolerance, producing data to Kafka, the > job failed due to "Batch Expired exception". I tried to increase the > "request.timeout.ms" and "max.block.ms" to 6 instead of 3 but still > the same problem. The only way to ride on this problem is using snapshotting. > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48106 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48105 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,036 WARN org.apache.kafka.clients.producer.internals.Sender >- Got error produce response with correlation id 48104 on topic-partition > flinkWordCountNoFaultToleranceSmall > -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION > 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask >- Caught exception while processing timer. > java.lang.RuntimeException: Could not forward element to next operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) > at > org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242) > at > org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Could not forward element to next > operator > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48) > at > org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29) > at > org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37) > at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316) > ... 15 more > Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:249) > at >
[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State
[ https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224566#comment-15224566 ] ASF GitHub Bot commented on FLINK-3654: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205406051 done :smile: > Disable Write-Ahead-Log in RocksDB State > > > Key: FLINK-3654 > URL: https://issues.apache.org/jira/browse/FLINK-3654 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We do our own checkpointing of the RocksDB database so the WAL is useless to > us. Disabling writes to the WAL should give us a very large performance boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205406051 done :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State
[ https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224550#comment-15224550 ] ASF GitHub Bot commented on FLINK-3654: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205402626 Would you mind triggering a few builds over night? > Disable Write-Ahead-Log in RocksDB State > > > Key: FLINK-3654 > URL: https://issues.apache.org/jira/browse/FLINK-3654 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We do our own checkpointing of the RocksDB database so the WAL is useless to > us. Disabling writes to the WAL should give us a very large performance boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205402626 Would you mind triggering a few builds over night? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State
[ https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224535#comment-15224535 ] ASF GitHub Bot commented on FLINK-3654: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205398515 I'm seeing this failure on Travis: https://travis-ci.org/aljoscha/flink/jobs/120638971 Could be a bug introduced by the change. I'm investigating... > Disable Write-Ahead-Log in RocksDB State > > > Key: FLINK-3654 > URL: https://issues.apache.org/jira/browse/FLINK-3654 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We do our own checkpointing of the RocksDB database so the WAL is useless to > us. Disabling writes to the WAL should give us a very large performance boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205398515 I'm seeing this failure on Travis: https://travis-ci.org/aljoscha/flink/jobs/120638971 Could be a bug introduced by the change. I'm investigating... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3174) Add merging WindowAssigner
[ https://issues.apache.org/jira/browse/FLINK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224521#comment-15224521 ] ASF GitHub Bot commented on FLINK-3174: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1802#issuecomment-205394604 No, we introduced the `InternalWindowFunction` before 1.0.0 to decouple the user `WindowFunction` from the window operator implementation. There are now special `InternalWindowFunctions` that don't forward the key to the user function. The API for the user does not change, there an `AllWindowFunction` is used, as before. And that function does not get a key. > Add merging WindowAssigner > -- > > Key: FLINK-3174 > URL: https://issues.apache.org/jira/browse/FLINK-3174 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > Fix For: 1.0.0 > > > We should add the possibility for WindowAssigners to merge windows. This will > enable Session windowing support, similar to how Google Cloud Dataflow > supports. > For session windows, each element would initially be assigned to its own > window. When triggering we check the windows and see if any can be merged. > This way, elements with overlapping session windows merge into one session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3174] Add MergingWindowAssigner and Ses...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1802#issuecomment-205394604 No, we introduced the `InternalWindowFunction` before 1.0.0 to decouple the user `WindowFunction` from the window operator implementation. There are now special `InternalWindowFunctions` that don't forward the key to the user function. The API for the user does not change, there an `AllWindowFunction` is used, as before. And that function does not get a key. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment
[ https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224518#comment-15224518 ] ASF GitHub Bot commented on FLINK-2609: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205393404 I'm not sure if we can register the KeySelector types that way. > Automatic type registration is only called from the batch execution > environment > --- > > Key: FLINK-2609 > URL: https://issues.apache.org/jira/browse/FLINK-2609 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > Kryo types in the streaming API are quite expensive to serialize because they > are not automatically registered at Kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205393404 I'm not sure if we can register the KeySelector types that way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request:
Github user aljoscha commented on the pull request: https://github.com/apache/flink/commit/76968c6360c17d5deb4e42727c16bc1b9a891b26#commitcomment-16956214 Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State
[ https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224507#comment-15224507 ] ASF GitHub Bot commented on FLINK-3654: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205391579 Can do, yes. But we shouldn't block, as you said. > Disable Write-Ahead-Log in RocksDB State > > > Key: FLINK-3654 > URL: https://issues.apache.org/jira/browse/FLINK-3654 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We do our own checkpointing of the RocksDB database so the WAL is useless to > us. Disabling writes to the WAL should give us a very large performance boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205391579 Can do, yes. But we shouldn't block, as you said. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3174) Add merging WindowAssigner
[ https://issues.apache.org/jira/browse/FLINK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224479#comment-15224479 ] ASF GitHub Bot commented on FLINK-3174: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1802#issuecomment-205384005 Concerning the removal of the non-keyed window operator: Does that mean that the user functions now see the dummy key? > Add merging WindowAssigner > -- > > Key: FLINK-3174 > URL: https://issues.apache.org/jira/browse/FLINK-3174 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > Fix For: 1.0.0 > > > We should add the possibility for WindowAssigners to merge windows. This will > enable Session windowing support, similar to how Google Cloud Dataflow > supports. > For session windows, each element would initially be assigned to its own > window. When triggering we check the windows and see if any can be merged. > This way, elements with overlapping session windows merge into one session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3174] Add MergingWindowAssigner and Ses...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1802#issuecomment-205384005 Concerning the removal of the non-keyed window operator: Does that mean that the user functions now see the dummy key? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment
[ https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224468#comment-15224468 ] ASF GitHub Bot commented on FLINK-2609: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205380553 I think static variable is even more problematic. Other possibility is to traverse the graph at the very end, and register at that point what is encountered. > Automatic type registration is only called from the batch execution > environment > --- > > Key: FLINK-2609 > URL: https://issues.apache.org/jira/browse/FLINK-2609 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > Kryo types in the streaming API are quite expensive to serialize because they > are not automatically registered at Kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205380553 I think static variable is even more problematic. Other possibility is to traverse the graph at the very end, and register at that point what is encountered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request:
Github user uce commented on the pull request: https://github.com/apache/flink/commit/76968c6360c17d5deb4e42727c16bc1b9a891b26#commitcomment-16955170 Can you please cherrypick this to `release-1.0` as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3589) Allow setting Operator parallelism to default value
[ https://issues.apache.org/jira/browse/FLINK-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224388#comment-15224388 ] ASF GitHub Bot commented on FLINK-3589: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1778#issuecomment-205368211 Perhaps there should be two flags, PARALLELISM_DEFAULT = -1 to reset the parallelism to the system default, and PARALLELISM_UNKNOWN = -2 which leaves the parallelism unchanged. > Allow setting Operator parallelism to default value > --- > > Key: FLINK-3589 > URL: https://issues.apache.org/jira/browse/FLINK-3589 > Project: Flink > Issue Type: Improvement > Components: Java API >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > User's can override the parallelism for a single operator by calling > {{Operator.setParallelism}}, which accepts a positive value. {{Operator}} > uses {{-1}} to indicate default parallelism. It would be nice to name and > accept this default value. > This would enable user algorithms to allow configurable parallelism while > still chaining operator methods. > For example, currently: > {code} > private int parallelism; > ... > public void setParallelism(int parallelism) { > this.parallelism = parallelism; > } > ... > MapOperator, Edge > newEdges = > edges > .map(new MyMapFunction()) > .name("My map function"); > if (parallelism > 0) { > newEdges.setParallelism(parallelism); > } > {code} > Could be simplified to: > {code} > private int parallelism = Operator.DEFAULT_PARALLELISM; > ... > public void setParallelism(int parallelism) { > this.parallelism = parallelism; > } > ... > DataSet > newEdges = edges > .map(new MyMapFunction()) > .setParallelism(parallelism) > .name("My map function"); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3589] Allow setting Operator parallelis...
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1778#issuecomment-205368211 Perhaps there should be two flags, PARALLELISM_DEFAULT = -1 to reset the parallelism to the system default, and PARALLELISM_UNKNOWN = -2 which leaves the parallelism unchanged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-3696) Some Union tests fail for TableConfigMode.EFFICIENT
Vasia Kalavri created FLINK-3696: Summary: Some Union tests fail for TableConfigMode.EFFICIENT Key: FLINK-3696 URL: https://issues.apache.org/jira/browse/FLINK-3696 Project: Flink Issue Type: Bug Components: Table API Affects Versions: 1.1.0 Reporter: Vasia Kalavri e.g. testUnionWithFilter gives the following exception: {code} org.apache.flink.api.common.InvalidProgramException: Cannot union inputs of different types. Input1=scala.Tuple3(_1: Integer, _2: Long, _3: String), input2=Java Tuple3at org.apache.flink.api.java.operators.UnionOperator.(UnionOperator.java:47) at org.apache.flink.api.java.DataSet.union(DataSet.java:1208) at org.apache.flink.api.table.plan.nodes.dataset.DataSetUnion.translateToPlan(DataSetUnion.scala:81) at org.apache.flink.api.table.plan.nodes.dataset.DataSetCalc.translateToPlan(DataSetCalc.scala:95) at org.apache.flink.api.java.table.JavaBatchTranslator.translate(JavaBatchTranslator.scala:91) at org.apache.flink.api.scala.table.ScalaBatchTranslator.translate(ScalaBatchTranslator.scala:51) at org.apache.flink.api.scala.table.TableConversions.toDataSet(TableConversions.scala:43) at org.apache.flink.api.scala.table.test.UnionITCase.testUnionWithFilter(UnionITCase.scala:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3619) SavepointCoordinator test failure
[ https://issues.apache.org/jira/browse/FLINK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-3619. -- Resolution: Fixed Fix Version/s: 1.1.0 Fixed in 138dc8f. > SavepointCoordinator test failure > - > > Key: FLINK-3619 > URL: https://issues.apache.org/jira/browse/FLINK-3619 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Gyula Fora >Assignee: Ufuk Celebi > Labels: test-stability > Fix For: 1.1.0 > > > I ecnountered the following test failure in a recent build on travis: > SavepointCoordinatorTest.testAbortOnCheckpointTimeout:517 expected:<0> but > was:<1> > Full log: > https://api.travis-ci.org/jobs/116334023/log.txt?deansi=true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-3576) Upgrade Snappy Java to 1.1.2.1
[ https://issues.apache.org/jira/browse/FLINK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved FLINK-3576. --- Resolution: Not A Problem > Upgrade Snappy Java to 1.1.2.1 > -- > > Key: FLINK-3576 > URL: https://issues.apache.org/jira/browse/FLINK-3576 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > There was a JVM memory leaky problem reported in > https://github.com/xerial/snappy-java/issues/131 > The above issue has been resolved. > 1.1.2.1 was released on Jan 22nd. > We should upgrade to this release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State
[ https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224362#comment-15224362 ] ASF GitHub Bot commented on FLINK-3654: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205359909 Should we add this for `1.0.2` (or `1.0.1` if the RC gets cancelled)? > Disable Write-Ahead-Log in RocksDB State > > > Key: FLINK-3654 > URL: https://issues.apache.org/jira/browse/FLINK-3654 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We do our own checkpointing of the RocksDB database so the WAL is useless to > us. Disabling writes to the WAL should give us a very large performance boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1850#issuecomment-205359909 Should we add this for `1.0.2` (or `1.0.1` if the RC gets cancelled)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224354#comment-15224354 ] Tian, Li edited comment on FLINK-3655 at 4/4/16 3:44 PM: - I think we may need to use "List filePaths" instead of "Path filePath" in FileInputFormat. In this way, we should also 1. modify current implementations to support multiple input paths 2. add functions like setFilePaths, getFilePaths to FileInputFormat, and support comma-seperated Path string in ExecutionEnvironment 3. for backward compatibility, let FileInputFormat.setFilePath set the inputPaths to a one-element list was (Author: tianli): I think we may need to use List instead of a single Path in FileInputFormat. In this way, we should also 1. modify current implementations to support multiple input paths 2. add functions like setFilePaths, getFilePaths to FileInputFormat, and support comma-seperated Path string in ExecutionEnvironment 3. for backward compatibility, let FileInputFormat.setFilePath set the inputPaths to a one-element list > Allow comma-separated or multiple directories to be specified for > FileInputFormat > - > > Key: FLINK-3655 > URL: https://issues.apache.org/jira/browse/FLINK-3655 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Gna Phetsarath >Priority: Minor > Labels: starter > > Allow comma-separated or multiple directories to be specified for > FileInputFormat so that a DataSource will process the directories > sequentially. > > env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*") > in Scala >env.readFile(paths: Seq[String]) > or > env.readFile(path: String, otherPaths: String*) > Wildcard support would be a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224354#comment-15224354 ] Tian, Li commented on FLINK-3655: - I think we may need to use List instead of a single Path in FileInputFormat. In this way, we should also 1. modify current implementations to support multiple input paths 2. add functions like setFilePaths, getFilePaths to FileInputFormat, and support comma-seperated Path string in ExecutionEnvironment 3. for backward compatibility, let FileInputFormat.setFilePath set the inputPaths to a one-element list > Allow comma-separated or multiple directories to be specified for > FileInputFormat > - > > Key: FLINK-3655 > URL: https://issues.apache.org/jira/browse/FLINK-3655 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.0.0 >Reporter: Gna Phetsarath >Priority: Minor > Labels: starter > > Allow comma-separated or multiple directories to be specified for > FileInputFormat so that a DataSource will process the directories > sequentially. > > env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*") > in Scala >env.readFile(paths: Seq[String]) > or > env.readFile(path: String, otherPaths: String*) > Wildcard support would be a bonus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3589) Allow setting Operator parallelism to default value
[ https://issues.apache.org/jira/browse/FLINK-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224328#comment-15224328 ] ASF GitHub Bot commented on FLINK-3589: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1778#discussion_r58394697 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java --- @@ -64,10 +64,15 @@ /** * The constant to use for the parallelism, if the system should use the number -* of currently available slots. +* of currently available slots. */ public static final int PARALLELISM_AUTO_MAX = Integer.MAX_VALUE; + /** +* The flag value indicating an unknown or unset parallelism. +*/ + public static final int PARALLELISM_UNKNOWN = -1; --- End diff -- The parallelism may be overridden elsewhere and we are not resetting it back to the default. This flag simply indicates a value for which the parallelism will not be overridden. > Allow setting Operator parallelism to default value > --- > > Key: FLINK-3589 > URL: https://issues.apache.org/jira/browse/FLINK-3589 > Project: Flink > Issue Type: Improvement > Components: Java API >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > User's can override the parallelism for a single operator by calling > {{Operator.setParallelism}}, which accepts a positive value. {{Operator}} > uses {{-1}} to indicate default parallelism. It would be nice to name and > accept this default value. > This would enable user algorithms to allow configurable parallelism while > still chaining operator methods. > For example, currently: > {code} > private int parallelism; > ... > public void setParallelism(int parallelism) { > this.parallelism = parallelism; > } > ... > MapOperator, Edge > newEdges = > edges > .map(new MyMapFunction()) > .name("My map function"); > if (parallelism > 0) { > newEdges.setParallelism(parallelism); > } > {code} > Could be simplified to: > {code} > private int parallelism = Operator.DEFAULT_PARALLELISM; > ... > public void setParallelism(int parallelism) { > this.parallelism = parallelism; > } > ... > DataSet > newEdges = edges > .map(new MyMapFunction()) > .setParallelism(parallelism) > .name("My map function"); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3589] Allow setting Operator parallelis...
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/1778#discussion_r58394697 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java --- @@ -64,10 +64,15 @@ /** * The constant to use for the parallelism, if the system should use the number -* of currently available slots. +* of currently available slots. */ public static final int PARALLELISM_AUTO_MAX = Integer.MAX_VALUE; + /** +* The flag value indicating an unknown or unset parallelism. +*/ + public static final int PARALLELISM_UNKNOWN = -1; --- End diff -- The parallelism may be overridden elsewhere and we are not resetting it back to the default. This flag simply indicates a value for which the parallelism will not be overridden. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment
[ https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224320#comment-15224320 ] ASF GitHub Bot commented on FLINK-2609: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205346744 i decided against the ExecutionEnvironment because i do not want to add a method (to access the deduplicator) in a user-facing class, when a user would never use it. > Automatic type registration is only called from the batch execution > environment > --- > > Key: FLINK-2609 > URL: https://issues.apache.org/jira/browse/FLINK-2609 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > Kryo types in the streaming API are quite expensive to serialize because they > are not automatically registered at Kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205346744 i decided against the ExecutionEnvironment because i do not want to add a method (to access the deduplicator) in a user-facing class, when a user would never use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3576) Upgrade Snappy Java to 1.1.2.1
[ https://issues.apache.org/jira/browse/FLINK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224304#comment-15224304 ] Stephan Ewen commented on FLINK-3576: - Do we actually have an explicit snappy dependency somewhere? I can find only transitive dependencies at the first glance... > Upgrade Snappy Java to 1.1.2.1 > -- > > Key: FLINK-3576 > URL: https://issues.apache.org/jira/browse/FLINK-3576 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > There was a JVM memory leaky problem reported in > https://github.com/xerial/snappy-java/issues/131 > The above issue has been resolved. > 1.1.2.1 was released on Jan 22nd. > We should upgrade to this release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3695) ValueArray types
Greg Hogan created FLINK-3695: - Summary: ValueArray types Key: FLINK-3695 URL: https://issues.apache.org/jira/browse/FLINK-3695 Project: Flink Issue Type: New Feature Components: Core Affects Versions: 1.1.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Minor Flink provides mutable {{Value}} type implementations of Java primitives along with efficient serializers and comparators. It would be useful to have corresponding {{ValueArray}} implementations backed by primitive rather than object arrays, along with an {{ArrayableValue}} interface tying a {{Value}} to its {{ValueArray}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Updates the AssignerWithPunctuatedWatermarks a...
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/1811#issuecomment-205341434 No problem @StephanEwen ! Thanks for merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State
[ https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224295#comment-15224295 ] ASF GitHub Bot commented on FLINK-3654: --- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1850 [FLINK-3654] Disable Write-Ahead-Log in RocksDB State You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink rocksdb-no-wal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1850.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1850 commit 0580de64ac36bd73f505084ffac44e512f127da2 Author: Aljoscha KrettekDate: 2016-03-22T17:08:15Z [FLINK-3654] Disable Write-Ahead-Log in RocksDB State > Disable Write-Ahead-Log in RocksDB State > > > Key: FLINK-3654 > URL: https://issues.apache.org/jira/browse/FLINK-3654 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We do our own checkpointing of the RocksDB database so the WAL is useless to > us. Disabling writes to the WAL should give us a very large performance boost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...
GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1850 [FLINK-3654] Disable Write-Ahead-Log in RocksDB State You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink rocksdb-no-wal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1850.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1850 commit 0580de64ac36bd73f505084ffac44e512f127da2 Author: Aljoscha KrettekDate: 2016-03-22T17:08:15Z [FLINK-3654] Disable Write-Ahead-Log in RocksDB State --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2674) Rework windowing logic
[ https://issues.apache.org/jira/browse/FLINK-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek resolved FLINK-2674. - Resolution: Fixed All sub-issues are done. > Rework windowing logic > -- > > Key: FLINK-2674 > URL: https://issues.apache.org/jira/browse/FLINK-2674 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Stephan Ewen >Priority: Critical > Fix For: 1.0.0 > > > The windowing logic needs a major overhaul. This follows the design > documents: > - > https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams > - https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=60624830 > Specifically, the following shortcomings need to be addressed: > - Global parallel windows should be dropped >-> for time, local windows are aligned and serve the same purpose >-> there is currently no known robust and efficient parallel > implementation of custom strategies > - Event time and out of order arrival needs to be supported > - Eviction of not accessed keys does not work. Non-accessed keys linger > infinitely > - Performance is currently bad for time windows, due to a overly general > implementation > - Resources are leaking, threads are not shut down > - Elements are stored multiple times (discretizers, window buffers) > - Finally, many implementations are buggy, produce wrong results -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2870) Add support for accumulating/discarding for Event-Time Windows
[ https://issues.apache.org/jira/browse/FLINK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-2870. --- Resolution: Duplicate This will be subsumed by the work in FLINK-3643 > Add support for accumulating/discarding for Event-Time Windows > -- > > Key: FLINK-2870 > URL: https://issues.apache.org/jira/browse/FLINK-2870 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > This would allow to specify whether windows should be discarded after the > trigger fires or kept in the operator if late elements arrive. > When keeping elements, the user would also have to specify an allowed > lateness time after which the window contents are discarded without emitting > any further window evaluation result. > If elements arrive after the allowed lateness they would trigger the window > immediately with only the one single element. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2909) Gelly Graph Generators
[ https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224282#comment-15224282 ] ASF GitHub Bot commented on FLINK-2909: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1807#issuecomment-205332950 Hi @greghogan, it's google drawings. > Gelly Graph Generators > -- > > Key: FLINK-2909 > URL: https://issues.apache.org/jira/browse/FLINK-2909 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > Include a selection of graph generators in Gelly. Generated graphs will be > useful for performing scalability, stress, and regression testing as well as > benchmarking and comparing algorithms, for both Flink users and developers. > Generated data is infinitely scalable yet described by a few simple > parameters and can often substitute for user data or sharing large files when > reporting issues. > There are at multiple categories of graphs as documented by > [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html] > and elsewhere. > Graphs may be a well-defined, i.e. the [Chvátal > graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be > sufficiently small to populate locally. > Graphs may be scalable, i.e. complete and star graphs. These should use > Flink's distributed parallelism. > Graphs may be stochastic, i.e. [RMat > graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] > . A key consideration is that the graphs should source randomness from a > seedable PRNG and generate the same Graph regardless of parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1807#issuecomment-205332950 Hi @greghogan, it's google drawings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-3576) Upgrade Snappy Java to 1.1.2.1
[ https://issues.apache.org/jira/browse/FLINK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-3576: -- Description: There was a JVM memory leaky problem reported in https://github.com/xerial/snappy-java/issues/131 The above issue has been resolved. 1.1.2.1 was released on Jan 22nd. We should upgrade to this release. was: There was a JVM memory leaky problem reported in https://github.com/xerial/snappy-java/issues/131 The above issue has been resolved. 1.1.2.1 was released on Jan 22nd. We should upgrade to this release. > Upgrade Snappy Java to 1.1.2.1 > -- > > Key: FLINK-3576 > URL: https://issues.apache.org/jira/browse/FLINK-3576 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > There was a JVM memory leaky problem reported in > https://github.com/xerial/snappy-java/issues/131 > The above issue has been resolved. > 1.1.2.1 was released on Jan 22nd. > We should upgrade to this release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2909) Gelly Graph Generators
[ https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224267#comment-15224267 ] ASF GitHub Bot commented on FLINK-2909: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1807#issuecomment-205328244 @vasia do you know what software was used to create the graphs in the existing Gelly documentation? > Gelly Graph Generators > -- > > Key: FLINK-2909 > URL: https://issues.apache.org/jira/browse/FLINK-2909 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > Include a selection of graph generators in Gelly. Generated graphs will be > useful for performing scalability, stress, and regression testing as well as > benchmarking and comparing algorithms, for both Flink users and developers. > Generated data is infinitely scalable yet described by a few simple > parameters and can often substitute for user data or sharing large files when > reporting issues. > There are at multiple categories of graphs as documented by > [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html] > and elsewhere. > Graphs may be a well-defined, i.e. the [Chvátal > graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be > sufficiently small to populate locally. > Graphs may be scalable, i.e. complete and star graphs. These should use > Flink's distributed parallelism. > Graphs may be stochastic, i.e. [RMat > graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] > . A key consideration is that the graphs should source randomness from a > seedable PRNG and generate the same Graph regardless of parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3685) Logical error in code for DateSerializer deserialize with reuse
[ https://issues.apache.org/jira/browse/FLINK-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224269#comment-15224269 ] Stephan Ewen commented on FLINK-3685: - Most serializers do not support null, and the runtime does not work with null values either. This is an "in case support will be added" logic that should be removed in my opinion. Also, because {{-1}} is actually a valid date. > Logical error in code for DateSerializer deserialize with reuse > --- > > Key: FLINK-3685 > URL: https://issues.apache.org/jira/browse/FLINK-3685 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: ZhengBowen > > There is a logical error in the following function in DateSerializer.java > when source read '-1' > function is: > ``` > public Date deserialize(Date reuse, DataInputView source) throws IOException { > long v = source.readLong(); > if(v == -1L) { > return null; > } > reuse.setTime(v); > return reuse; > } > ``` > when call this function for first time, if return null, then 'reuse' will be > set null by caller; > when call this function for second time,if 'v!=-1' ,reuse.setTime(v) will > throw NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1807#issuecomment-205328244 @vasia do you know what software was used to create the graphs in the existing Gelly documentation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3613) Add standard deviation, mean, variance to list of Aggregations
[ https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224259#comment-15224259 ] Stephan Ewen commented on FLINK-3613: - The design of the extended aggregators makes a lot of sense. I agree with Fabian that we should discuss two things first, however: 1. Do we want such extended aggregations in the DataSet API, or basically push people to use the Table API instead? My gut feeling is that it makes sense to have this in the DataSet API if we answer (2) with "yes" have a good design for (3). 2. I assume it should allow to use multiple aggregation functions, such that one could create something {{like (a, b) --> (max(a), min(a), avg(b))}} 3. How do we want the signatures for this to look? Ideally making this typesafe via a builder (similar to the CSV input on ExecutionEnvironment). > Add standard deviation, mean, variance to list of Aggregations > -- > > Key: FLINK-3613 > URL: https://issues.apache.org/jira/browse/FLINK-3613 > Project: Flink > Issue Type: Improvement >Reporter: Todd Lisonbee >Priority: Minor > Attachments: DataSet-Aggregation-Design-March2016-v1.txt > > > Implement standard deviation, mean, variance for > org.apache.flink.api.java.aggregation.Aggregations > Ideally implementation should be single pass and numerically stable. > References: > "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et > al, International Conference on Data Engineering 2012 > http://dl.acm.org/citation.cfm?id=2310392 > "The Kahan summation algorithm (also known as compensated summation) reduces > the numerical errors that occur when adding a sequence of finite precision > floating point numbers. Numerical errors arise due to truncation and > rounding. These errors can lead to numerical instability when calculating > variance." > https://en.wikipedia.org/wiki/Kahan_summation_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-3613) Add standard deviation, mean, variance to list of Aggregations
[ https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224259#comment-15224259 ] Stephan Ewen edited comment on FLINK-3613 at 4/4/16 2:42 PM: - The design of the extended aggregators makes a lot of sense. I agree with Fabian that we should discuss two things first, however: 1. Do we want such extended aggregations in the DataSet API, or basically push people to use the Table API instead? My gut feeling is that it makes sense to have this in the DataSet API if we answer (2) with "yes" have a good design for (3). 2. I assume it should allow to use multiple aggregation functions, such that one could create something like {{(a, b) --> (max(a), min(a), avg(b))}} 3. How do we want the signatures for this to look? Ideally making this typesafe via a builder (similar to the CSV input on ExecutionEnvironment). was (Author: stephanewen): The design of the extended aggregators makes a lot of sense. I agree with Fabian that we should discuss two things first, however: 1. Do we want such extended aggregations in the DataSet API, or basically push people to use the Table API instead? My gut feeling is that it makes sense to have this in the DataSet API if we answer (2) with "yes" have a good design for (3). 2. I assume it should allow to use multiple aggregation functions, such that one could create something {{like (a, b) --> (max(a), min(a), avg(b))}} 3. How do we want the signatures for this to look? Ideally making this typesafe via a builder (similar to the CSV input on ExecutionEnvironment). > Add standard deviation, mean, variance to list of Aggregations > -- > > Key: FLINK-3613 > URL: https://issues.apache.org/jira/browse/FLINK-3613 > Project: Flink > Issue Type: Improvement >Reporter: Todd Lisonbee >Priority: Minor > Attachments: DataSet-Aggregation-Design-March2016-v1.txt > > > Implement standard deviation, mean, variance for > org.apache.flink.api.java.aggregation.Aggregations > Ideally implementation should be single pass and numerically stable. > References: > "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et > al, International Conference on Data Engineering 2012 > http://dl.acm.org/citation.cfm?id=2310392 > "The Kahan summation algorithm (also known as compensated summation) reduces > the numerical errors that occur when adding a sequence of finite precision > floating point numbers. Numerical errors arise due to truncation and > rounding. These errors can lead to numerical instability when calculating > variance." > https://en.wikipedia.org/wiki/Kahan_summation_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3634) Fix documentation for DataSetUtils.zipWithUniqueId()
[ https://issues.apache.org/jira/browse/FLINK-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224250#comment-15224250 ] ASF GitHub Bot commented on FLINK-3634: --- Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1817#issuecomment-205325957 The documentation for `zipWithIndex` now references requiring two passes and notes `zipWithUniqueId` as an alternative. > Fix documentation for DataSetUtils.zipWithUniqueId() > > > Key: FLINK-3634 > URL: https://issues.apache.org/jira/browse/FLINK-3634 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.1.0, 1.0.1 > > > Under FLINK-2590 the assignment and testing of unique IDs was improved but > the documentation looks to still reference the old implementation. > With {{parallelism=1}} there is no difference between zipWithUniqueID and > zipWithIndex. With greater parallelism the results of zipWithUniqueID are > dependent on the partitioning. > The documentation should demonstrate a possible result that is different from > the incremental sequence of zipWithIndex while noting that results are > dependent on the parallelism and partitioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3685) Logical error in code for DateSerializer deserialize with reuse
[ https://issues.apache.org/jira/browse/FLINK-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224247#comment-15224247 ] Robert Metzger commented on FLINK-3685: --- Did you see any failures related to this? > Logical error in code for DateSerializer deserialize with reuse > --- > > Key: FLINK-3685 > URL: https://issues.apache.org/jira/browse/FLINK-3685 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: ZhengBowen > > There is a logical error in the following function in DateSerializer.java > when source read '-1' > function is: > ``` > public Date deserialize(Date reuse, DataInputView source) throws IOException { > long v = source.readLong(); > if(v == -1L) { > return null; > } > reuse.setTime(v); > return reuse; > } > ``` > when call this function for first time, if return null, then 'reuse' will be > set null by caller; > when call this function for second time,if 'v!=-1' ,reuse.setTime(v) will > throw NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3634] [docs] Fix documentation for Data...
Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1817#issuecomment-205325957 The documentation for `zipWithIndex` now references requiring two passes and notes `zipWithUniqueId` as an alternative. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-3693) JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour is unstable
[ https://issues.apache.org/jira/browse/FLINK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-3693. -- Resolution: Fixed Fix Version/s: 1.1.0 Fixed in 9e7c664. > JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour is > unstable > -- > > Key: FLINK-3693 > URL: https://issues.apache.org/jira/browse/FLINK-3693 > Project: Flink > Issue Type: Bug >Reporter: Robert Metzger >Assignee: Ufuk Celebi > Labels: test-stability > Fix For: 1.1.0 > > > {code} > testClientNonDetachedListeningBehaviour(org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase) > Time elapsed: 13.834 sec <<< ERROR! > java.lang.IllegalStateException: Job is in terminal state FAILED, but was > waiting for RUNNING. > at > org.apache.flink.runtime.testutils.JobManagerActorTestUtils.waitForJobStatus(JobManagerActorTestUtils.java:83) > at > org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour(JobManagerHAJobGraphRecoveryITCase.java:305) > {code} > https://s3.amazonaws.com/archive.travis-ci.org/jobs/120072682/log.txt > https://s3.amazonaws.com/archive.travis-ci.org/jobs/120105002/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3664) Create a method to easily Summarize a DataSet
[ https://issues.apache.org/jira/browse/FLINK-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224239#comment-15224239 ] Stephan Ewen commented on FLINK-3664: - The summarize design code looks good so far. Since it is part of the {{DataSetUtils}}, not breaking / altering any existing behavior, and it seems lightweight, it should be good to add this. Feel free to open a pull request with this... > Create a method to easily Summarize a DataSet > - > > Key: FLINK-3664 > URL: https://issues.apache.org/jira/browse/FLINK-3664 > Project: Flink > Issue Type: Improvement >Reporter: Todd Lisonbee > Attachments: DataSet-Summary-Design-March2016-v1.txt > > > Here is an example: > {code} > /** > * Summarize a DataSet of Tuples by collecting single pass statistics for all > columns > */ > public Tuple summarize() > Dataset> input = // [...] > Tuple3 summary > = input.summarize() > summary.getField(0).stddev() > summary.getField(1).maxStringLength() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission
[ https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224225#comment-15224225 ] ASF GitHub Bot commented on FLINK-3595: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1780#issuecomment-205319812 Just rebased. Waiting for Travis and then merging to `master` and `release-1.0`. > Kafka09 consumer thread does not interrupt when stuck in record emission > > > Key: FLINK-3595 > URL: https://issues.apache.org/jira/browse/FLINK-3595 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.0.0 >Reporter: Stephan Ewen >Assignee: Ufuk Celebi >Priority: Critical > Fix For: 1.1.0, 1.0.1 > > > When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a > blocking method (output emitting) and never wakes up. > The thread as a whole cannot be simply interrupted, because of a bug in Kafka > that makes the consumer freeze/hang up on interrupt. > There are two possible solutions: > - allow and call interrupt when the consumer thread is emitting elements > - destroy the output network buffer pools eagerly on canceling. The Kafka > thread will then throw an exception if it is stuck in emitting elements and > it will terminate, which is accepted in case the status is canceled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3595] [runtime] Eagerly destroy buffer ...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1780#issuecomment-205319812 Just rebased. Waiting for Travis and then merging to `master` and `release-1.0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method
[ https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224218#comment-15224218 ] Josep Rubió commented on FLINK-1707: Hi Vasia, Yes, I'm working with scatter-gather iteration. I've seen the vertex-centric iteration and being almost the same than Giraph it could make easier migrating from the Okapi implementation. It's enough now. Thanks! > Add an Affinity Propagation Library Method > -- > > Key: FLINK-1707 > URL: https://issues.apache.org/jira/browse/FLINK-1707 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Josep Rubió >Priority: Minor > Labels: requires-design-doc > Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf > > > This issue proposes adding the an implementation of the Affinity Propagation > algorithm as a Gelly library method and a corresponding example. > The algorithm is described in paper [1] and a description of a vertex-centric > implementation can be found is [2]. > [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf > [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf > Design doc: > https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3694) Document stoppable function interface for custom sources
Till Rohrmann created FLINK-3694: Summary: Document stoppable function interface for custom sources Key: FLINK-3694 URL: https://issues.apache.org/jira/browse/FLINK-3694 Project: Flink Issue Type: Improvement Components: DataStream API Affects Versions: 1.0.0, 1.1.0 Reporter: Till Rohrmann Priority: Minor With Flink 1.0 we have introduced the {{StoppableFunction}} which allows to define stoppable sources for the {{DataStream}} API. This feature is not documented yet. I think we should add a section about custom sources which also explains the usage of the {{StoppableFunction}} interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.
[ https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler reassigned FLINK-1502: --- Assignee: Chesnay Schepler > Expose metrics to graphite, ganglia and JMX. > > > Key: FLINK-1502 > URL: https://issues.apache.org/jira/browse/FLINK-1502 > Project: Flink > Issue Type: Sub-task > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Robert Metzger >Assignee: Chesnay Schepler >Priority: Minor > Fix For: pre-apache > > > The metrics library allows to expose collected metrics easily to other > systems such as graphite, ganglia or Java's JVM (VisualVM). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3443) JobManager cancel and clear everything fails jobs instead of cancelling
[ https://issues.apache.org/jira/browse/FLINK-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-3443. -- Resolution: Won't Fix > JobManager cancel and clear everything fails jobs instead of cancelling > --- > > Key: FLINK-3443 > URL: https://issues.apache.org/jira/browse/FLINK-3443 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When the job manager is shut down, it calls {{cancelAndClearEverything}}. > This method does not {{cancel}} the {{ExecutionGraph}} instances, but > {{fail}}s them, which can lead to {{ExecutionGraph}} restart. > I've noticed this in tests, where old graph got into a loop of restarts. > What I don't understand is why the futures etc. are not cancelled when the > executor service is shut down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input
[ https://issues.apache.org/jira/browse/FLINK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224182#comment-15224182 ] Robert Metzger commented on FLINK-3679: --- I had a quick offline chat about this with [~StephanEwen]. Changing the semantics of the DeserializationSchema to use an OutputCollector would be possible, but it would break existing code, introduce a new class and make the locking / operator chaining of the Kafka consumer code more complicated. I wonder if the problems you've mentioned can't be solved with a flatMap() operator. When the Kafka consumer and the flatMap() are executed with the same parallelism, they'll be chained together and then executed in the same thread with almost no overhead. If one Kafka message results in two or more logical messages, that "splitting" can be done in the flatMap() as well. For invalid records, this can also be reflected in the returned record (with a failure flag (some id set to -1 or a bool set to false), or a special field in a JSON record), ...) and then treated accordingly in the flatMap() call. If you want, we can keep the JIRA issue open and see if more users run into this. If so, we can reconsider fixing it (I'm not saying I've decided against fixing it) > DeserializationSchema should handle zero or more outputs for every input > > > Key: FLINK-3679 > URL: https://issues.apache.org/jira/browse/FLINK-3679 > Project: Flink > Issue Type: Bug > Components: DataStream API >Reporter: Jamie Grier > > There are a couple of issues with the DeserializationSchema API that I think > should be improved. This request has come to me via an existing Flink user. > The main issue is simply that the API assumes that there is a one-to-one > mapping between input and outputs. In reality there are scenarios where one > input message (say from Kafka) might actually map to zero or more logical > elements in the pipeline. > Particularly important here is the case where you receive a message from a > source (such as Kafka) and say the raw bytes don't deserialize properly. > Right now the only recourse is to throw IOException and therefore fail the > job. > This is definitely not good since bad data is a reality and failing the job > is not the right option. If the job fails we'll just end up replaying the > bad data and the whole thing will start again. > Instead in this case it would be best if the user could just return the empty > set. > The other case is where one input message should logically be multiple output > messages. This case is probably less important since there are other ways to > do this but in general it might be good to make the > DeserializationSchema.deserialize() method return a collection rather than a > single element. > Maybe we need to support a DeserializationSchema variant that has semantics > more like that of FlatMap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment
[ https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224184#comment-15224184 ] ASF GitHub Bot commented on FLINK-2609: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205307922 Let's not merge this as is, it uses again a static de-duplicator, like the DataSet API used to (we recently eliminated that). These static utils to work around putting adding variable in the right place is an antipattern. Why not have the registration de-duplicator in the ExecutionEnvironment? Since the ExecutionCondig is also one-per environment, and is always accessed through the environment, that seems like the way to go... > Automatic type registration is only called from the batch execution > environment > --- > > Key: FLINK-2609 > URL: https://issues.apache.org/jira/browse/FLINK-2609 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > Kryo types in the streaming API are quite expensive to serialize because they > are not automatically registered at Kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2119) Add ExecutionGraph support for leg-wise scheduling
[ https://issues.apache.org/jira/browse/FLINK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2119: --- Assignee: (was: Ufuk Celebi) > Add ExecutionGraph support for leg-wise scheduling > -- > > Key: FLINK-2119 > URL: https://issues.apache.org/jira/browse/FLINK-2119 > Project: Flink > Issue Type: Improvement > Components: Scheduler >Affects Versions: 0.10.0 >Reporter: Ufuk Celebi > > Scheduling currently happens by lazily unrolling the ExecutionGraph from the > sources. > 1. All sources are scheduled for execution. > 2. Their results trigger scheduling and deployment of the receiving tasks > (either on the first available buffer or when all are produced [pipelined vs. > blocking exchange]). > For certain batch jobs this can be problematic as many tasks will be running > at the same time and consume task manager resources like executionslots and > memory. For these jobs, it is desirable to schedule the ExecutionGraph in > with different strategies. > With respect to the ExecutionGraph, the current limitation is that data > availability for a result always triggers scheduling of the consuming tasks. > This needs to be more general to allow different scheduling strategies. > Consider the following example: > {code} > [ union ] > / \ > / \ > [ source 1 ] [ source 2 ] > {code} > Currently, both sources are scheduled concurrently and the "faster" one > triggers scheduling of the union. It is desirable to first allow source 1 to > completly produce its result, then trigger scheduling of source 2, and only > then schedule the union. > The required changes in the ExecutionGraph are conceptually straight-forward: > instead of going through the list of result consumers and scheduling them, we > need to be able to run a more general action. For normal operation, this will > still schedule the consumer task, but we can also configure it to kick of the > next source etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1833#issuecomment-205307922 Let's not merge this as is, it uses again a static de-duplicator, like the DataSet API used to (we recently eliminated that). These static utils to work around putting adding variable in the right place is an antipattern. Why not have the registration de-duplicator in the ExecutionEnvironment? Since the ExecutionCondig is also one-per environment, and is always accessed through the environment, that seems like the way to go... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3571) SavepointITCase.testRestoreFailure fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224180#comment-15224180 ] Ufuk Celebi commented on FLINK-3571: It's impossible to fully debug this w/o logs :( Added better output in 21480e2. > SavepointITCase.testRestoreFailure fails on Travis > -- > > Key: FLINK-3571 > URL: https://issues.apache.org/jira/browse/FLINK-3571 > Project: Flink > Issue Type: Bug >Reporter: Till Rohrmann >Assignee: Ufuk Celebi >Priority: Critical > Labels: test-stability > > The test case {{SavepointITCase.testRestoreFailure}} failed on Travis [1] > [1] https://s3.amazonaws.com/archive.travis-ci.org/jobs/113084521/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3492) Allow users to define a min pause between checkpoints
[ https://issues.apache.org/jira/browse/FLINK-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224178#comment-15224178 ] ASF GitHub Bot commented on FLINK-3492: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1794#issuecomment-205306447 How about adjusting the test such that it works the following way: - it polls the number of checkpoint calls (from the atomic integer), say every 100 ms - it starts measuring time from when the count goes to `1` - it validates that `(numCalls - 2) * pause` is always smaller than the elapsed time since starting the time measurement > Allow users to define a min pause between checkpoints > - > > Key: FLINK-3492 > URL: https://issues.apache.org/jira/browse/FLINK-3492 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > FLINK-3051 introduced already a filed in the {{CheckpointConfig}} to specify > a min pause between checkpoints. > In high-load situations (big state), jobs might spend their entire time > creating snapshots, not processing data. With a min pause between > checkpoints, users can guarantee that there is a certain time-span the system > can use for doing some actual data processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3492] Configurable interval between Che...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1794#issuecomment-205306447 How about adjusting the test such that it works the following way: - it polls the number of checkpoint calls (from the atomic integer), say every 100 ms - it starts measuring time from when the count goes to `1` - it validates that `(numCalls - 2) * pause` is always smaller than the elapsed time since starting the time measurement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3492) Allow users to define a min pause between checkpoints
[ https://issues.apache.org/jira/browse/FLINK-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224158#comment-15224158 ] ASF GitHub Bot commented on FLINK-3492: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1794#issuecomment-205303755 The code looks good, but the tests are bound to be unstable. Making the assumption that sleeps actually sleep as long as they should is often violated (seems especially on Travis). I think we need to find a better way to test this... > Allow users to define a min pause between checkpoints > - > > Key: FLINK-3492 > URL: https://issues.apache.org/jira/browse/FLINK-3492 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Robert Metzger >Assignee: Chesnay Schepler > > FLINK-3051 introduced already a filed in the {{CheckpointConfig}} to specify > a min pause between checkpoints. > In high-load situations (big state), jobs might spend their entire time > creating snapshots, not processing data. With a min pause between > checkpoints, users can guarantee that there is a certain time-span the system > can use for doing some actual data processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3492] Configurable interval between Che...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1794#issuecomment-205303755 The code looks good, but the tests are bound to be unstable. Making the assumption that sleeps actually sleep as long as they should is often violated (seems especially on Travis). I think we need to find a better way to test this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package
[ https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-3524. --- Resolution: Fixed Fix Version/s: 1.1.0 Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/c7595840 > Provide a JSONDeserialisationSchema in the kafka connector package > -- > > Key: FLINK-3524 > URL: https://issues.apache.org/jira/browse/FLINK-3524 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Robert Metzger >Assignee: Chesnay Schepler > Labels: starter > Fix For: 1.1.0 > > > (I don't want to include this into 1.0.0) > Currently, there is no standardized way of parsing JSON data from a Kafka > stream. I see a lot of users using JSON in their topics. It would make things > easier for our users to provide a serializer for them. > I suggest to use the jackson library because we have that aready as a > dependency in Flink and it allows to parse from a byte[]. > I would suggest to provide the following classes: > - JSONDeserializationSchema() > - JSONDeKeyValueSerializationSchema(bool includeMetadata) > The second variant should produce a record like this: > {code} > {"key": "keydata", > "value": "valuedata", > "metadata": {"offset": 123, "topic": "", "partition": 2 } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3524] [kafka] Add JSONDeserializationSc...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1834 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package
[ https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224155#comment-15224155 ] ASF GitHub Bot commented on FLINK-3524: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1834 > Provide a JSONDeserialisationSchema in the kafka connector package > -- > > Key: FLINK-3524 > URL: https://issues.apache.org/jira/browse/FLINK-3524 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Robert Metzger >Assignee: Chesnay Schepler > Labels: starter > Fix For: 1.1.0 > > > (I don't want to include this into 1.0.0) > Currently, there is no standardized way of parsing JSON data from a Kafka > stream. I see a lot of users using JSON in their topics. It would make things > easier for our users to provide a serializer for them. > I suggest to use the jackson library because we have that aready as a > dependency in Flink and it allows to parse from a byte[]. > I would suggest to provide the following classes: > - JSONDeserializationSchema() > - JSONDeKeyValueSerializationSchema(bool includeMetadata) > The second variant should produce a record like this: > {code} > {"key": "keydata", > "value": "valuedata", > "metadata": {"offset": 123, "topic": "", "partition": 2 } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package
[ https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224149#comment-15224149 ] ASF GitHub Bot commented on FLINK-3524: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1834#issuecomment-205303136 Merging ... > Provide a JSONDeserialisationSchema in the kafka connector package > -- > > Key: FLINK-3524 > URL: https://issues.apache.org/jira/browse/FLINK-3524 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Robert Metzger >Assignee: Chesnay Schepler > Labels: starter > > (I don't want to include this into 1.0.0) > Currently, there is no standardized way of parsing JSON data from a Kafka > stream. I see a lot of users using JSON in their topics. It would make things > easier for our users to provide a serializer for them. > I suggest to use the jackson library because we have that aready as a > dependency in Flink and it allows to parse from a byte[]. > I would suggest to provide the following classes: > - JSONDeserializationSchema() > - JSONDeKeyValueSerializationSchema(bool includeMetadata) > The second variant should produce a record like this: > {code} > {"key": "keydata", > "value": "valuedata", > "metadata": {"offset": 123, "topic": "", "partition": 2 } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3524] [kafka] Add JSONDeserializationSc...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1834#issuecomment-205303136 Merging ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3174) Add merging WindowAssigner
[ https://issues.apache.org/jira/browse/FLINK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224143#comment-15224143 ] ASF GitHub Bot commented on FLINK-3174: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1802#issuecomment-205302445 I did the changes, I introduced `canMerge()` and added a default `onMerge()` that throws a `RuntimeExeption` in `Trigger`. This way we don't break the API for already existing user triggers. If there are no objections I will also merge the removal of the non-keyed window operator that this PR is based on. > Add merging WindowAssigner > -- > > Key: FLINK-3174 > URL: https://issues.apache.org/jira/browse/FLINK-3174 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > Fix For: 1.0.0 > > > We should add the possibility for WindowAssigners to merge windows. This will > enable Session windowing support, similar to how Google Cloud Dataflow > supports. > For session windows, each element would initially be assigned to its own > window. When triggering we check the windows and see if any can be merged. > This way, elements with overlapping session windows merge into one session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-3174] Add MergingWindowAssigner and Ses...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1802#issuecomment-205302445 I did the changes, I introduced `canMerge()` and added a default `onMerge()` that throws a `RuntimeExeption` in `Trigger`. This way we don't break the API for already existing user triggers. If there are no objections I will also merge the removal of the non-keyed window operator that this PR is based on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---