[jira] [Commented] (FLINK-3688) ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is called and TimeCharacteristic = ProcessingTime

2016-04-04 Thread Konstantin Knauf (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225734#comment-15225734
 ] 

Konstantin Knauf commented on FLINK-3688:
-

I will go ahead an open the PR for 4. this evening. 

> ClassCastException in StreamRecordSerializer when WindowOperator.trigger() is 
> called and TimeCharacteristic = ProcessingTime
> 
>
> Key: FLINK-3688
> URL: https://issues.apache.org/jira/browse/FLINK-3688
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Critical
>
> Hi,
> when using {{TimeCharacteristics.ProcessingTime}} a ClassCastException is 
> thrown in {{StreamRecordSerializer}} when 
> {{WindowOperator.processWatermark()}} is called from 
> {{WindowOperator.trigger()}}, i.e. whenever a ProcessingTimeTimer is 
> triggered. 
> The problem seems to be that {{processWatermark()}} is also called in 
> {{trigger()}}, when time characteristic is ProcessingTime, but in 
> {{RecordWriterOutput}} {{enableWatermarkMultiplexing}} is {{false}} and the 
> {{TypeSerializer}} is a {{StreamRecordSerializer}}, which ultimately leads to 
> the ClassCastException. Do you agree?
> If this is indeed a bug, there several possible solutions.
> # Only calling {{processWatermark()}} in {{trigger()}}, when 
> TimeCharacteristic is EventTime
> # Not calling {{processWatermark()}} in {{trigger()}} at all, instead wait 
> for the next watermark to trigger the EventTimeTimers with a timestamp behind 
> the current watermark. This is, of course, a trade off. 
> # Using {{MultiplexingStreamRecordSerializer}} all the time, but I have no 
> idea what the side effect of this change would be. I assume there is a reason 
> for existence of the StreamRecordSerializer ;)
> StackTrace: 
> {quote}
> TimerException\{java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord\}
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:716)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.emitWatermark(OperatorChain.java:370)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processWatermark(WindowOperator.java:293)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.trigger(WindowOperator.java:323)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:710)
>   ... 7 more
> Caused by: java.lang.ClassCastException: 
> org.apache.flink.streaming.api.watermark.Watermark cannot be cast to 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecord
>   at 
> org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer.serialize(StreamRecordSerializer.java:42)
>   at 
> org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:56)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.addRecord(SpanningRecordSerializer.java:79)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.broadcastEmit(RecordWriter.java:109)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.broadcastEmit(StreamRecordWriter.java:95)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.emitWatermark(RecordWriterOutput.java:90)
>   ... 11 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3541] [Kafka Connector] Clean up workar...

2016-04-04 Thread skyahead
Github user skyahead commented on the pull request:

https://github.com/apache/flink/pull/1846#issuecomment-205604300
  
@StephanEwen I changed Kafka version from 0.9.0.1 to 0.9.0.0 and verified 
that NullPointerException does get caught, and the code retries connecting for 
10 times. 

Using 0.9.0.1 however, the NullPointerException does not happen anymore, 
whereas a TimeoutException is thrown as expected and got caught expected too.

All test cases in Kafka08ITCase (19 cases) and Kafka09ITCase (15 cases) 
pass in my local environment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3698) Scala Option can't be used as a key

2016-04-04 Thread Timur Fayruzov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timur Fayruzov updated FLINK-3698:
--
Description: 
Discussion: 
http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E

Option should be treated the same way as other generic objects where it can be 
used as a key if generic argument implements Comparable.
Here's a scalatest in FunSpec format that illustrates the issue:
```
case class MyKey(x: Option[String])

it("can't use options inside classes used as keys") {
  val a = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("c"
  val b = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("z"
  intercept[InvalidProgramException]{
a.coGroup(b)
  .where(e => e)
  .equalTo(e => e)
  }


  // workaround
  a.coGroup(b)
.where(e => e.toString) // `e` should be translated to any object that 
implements Comparable interface to be a valid key.
.equalTo(e => e.toString)
}
```

  was:
Discussion: 
http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E

Option should be treated the same way as other generic objects where it can be 
used as a key if generic argument implements Comparable.


> Scala Option can't be used as a key
> ---
>
> Key: FLINK-3698
> URL: https://issues.apache.org/jira/browse/FLINK-3698
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Reporter: Timur Fayruzov
>Priority: Minor
>
> Discussion: 
> http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E
> Option should be treated the same way as other generic objects where it can 
> be used as a key if generic argument implements Comparable.
> Here's a scalatest in FunSpec format that illustrates the issue:
> ```
> case class MyKey(x: Option[String])
> it("can't use options inside classes used as keys") {
>   val a = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("c"
>   val b = env.fromCollection(Seq(MyKey(Some("a")), MyKey(Some("z"
>   intercept[InvalidProgramException]{
> a.coGroup(b)
>   .where(e => e)
>   .equalTo(e => e)
>   }
>   // workaround
>   a.coGroup(b)
> .where(e => e.toString) // `e` should be translated to any object 
> that implements Comparable interface to be a valid key.
> .equalTo(e => e.toString)
> }
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3698) Scala Option can't be used as a key

2016-04-04 Thread Timur Fayruzov (JIRA)
Timur Fayruzov created FLINK-3698:
-

 Summary: Scala Option can't be used as a key
 Key: FLINK-3698
 URL: https://issues.apache.org/jira/browse/FLINK-3698
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Timur Fayruzov
Priority: Minor


Discussion: 
http://mail-archives.apache.org/mod_mbox/flink-user/201603.mbox/%3CCAO0MGUjWQaovvUvAB%3DBYrwQzA0ocFtMy4%3DV%3DP--343Sy1V5BSg%40mail.gmail.com%3E

Option should be treated the same way as other generic objects where it can be 
used as a key if generic argument implements Comparable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3697) keyBy() with nested POJO computes invalid field position indexes

2016-04-04 Thread Ron Crocker (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Crocker updated FLINK-3697:
---
Description: 
Using named keys in keyBy() for nested POJO types results in failure. The 
iindexes for named key fields are used inconsistently with nested POJO types. 
In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position 
after (apparently) flattening the structure but is referenced in the 
unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}.

In the example below, getFlatFields() returns positions 0, 1, and 14. These 
positions appear correct in the flattened structure of the Data class. However, 
in {{KeySelector getSelectorForKeys(Keys keys, TypeInformation 
typeInfo, ExecutionConfig executionConfig)}}, a call to 
{{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results 
{{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the 
length of the directly named fields of the object vs the length of flattened 
version of that type.

Concrete Example:
Consider this graph:
{code}
DataStream dataStream = see.addSource(new 
FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), 
kafkaConsumerProperties));

dataStream
  .flatMap(new DataMapper())
  .keyBy("aaa", "abc", "wxyz")
{code}

{{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes 
this NativeDataFormat object and extracts individual Data objects: {code}
public class Data {
public int aaa;
public int abc;
public long wxyz;
public int t1;
public int t2;
public Policy policy;
public Stats stats;

public Data() {}
{code}

A {{Policy}} object is an instance of this class:
{code}
public class Policy {
public short a;
public short b;
public boolean c;
public boolean d;

public Policy() {}
}
{code}

A {{Stats}} object is an instance of this class:
{code}
public class Stats {
public long count;
public float a;
public float b;
public float c;
public float d;
public float e;

public Stats() {}
}
{code}

  was:
Using named keys in keyBy() for nested POJO types results in failure. The 
iindexes for named key fields are used inconsistently with nested POJO types. 
In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position 
after (apparently) flattening the structure but is referenced in the 
unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}.

In the example below, getFlatFields() returns positions 0, 1, and 14. These 
positions appear correct in the flattened structure of the Data class. However, 
in {{KeySelector getSelectorForKeys(Keys keys, TypeInformation 
typeInfo, ExecutionConfig executionConfig)}}, a call to 
{{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results 
{{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the 
length of the directly named fields of the object vs the length of flattened 
version of that type.

Concrete Example:
Consider this graph:
{code}
DataStream dataStream = see.addSource(new 
FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), 
kafkaConsumerProperties));

dataStream
  .flatMap(new DataMapper())
  .keyBy("aaa", "abc", "wxyz")
{code}

{{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes 
this NativeDataFormat object and extracts individual Data objects: {code}
public class Data {
public int aaa;
public int abc;
public long wxyz;
public int t1;
public int t2;
public Policy policy;
public Stats stats;

public Data() {}
{code}

A {{Policy}} object is an instance of this class:
{code}
public class AggregatableMetricStoragePolicy implements MetricStoragePolicy {
public short a;
public short b;
public boolean c;
public boolean d;

public Policy() {}
}
{code}

A {{Stats}} object is an instance of this class:
{code}
public class Stats {
public long count;
public float a;
public float b;
public float c;
public float d;
public float e;

public Stats() {}
}
{code}


> keyBy() with nested POJO computes invalid field position indexes
> 
>
> Key: FLINK-3697
> URL: https://issues.apache.org/jira/browse/FLINK-3697
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.0.0
> Environment: MacOS X 10.10
>Reporter: Ron Crocker
>Priority: Minor
>  Labels: pojo
>
> Using named keys in keyBy() for nested POJO types results in failure. The 
> iindexes for named key fields are used inconsistently with nested POJO types. 
> In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position 
> after (apparently) flattening 

[jira] [Closed] (FLINK-3623) Adjust MurmurHash algorithm

2016-04-04 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-3623.
---

> Adjust MurmurHash algorithm
> ---
>
> Key: FLINK-3623
> URL: https://issues.apache.org/jira/browse/FLINK-3623
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.1.0
>
>
> Flink's MurmurHash implementation differs from the published algorithm.
> From Flink's MathUtils.java:
> {code}
> code *= 0xe6546b64;
> {code}
> The Murmur3_32 algorithm as described by 
> [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]:
> {code}
> m ← 5
> n ← 0xe6546b64
> hash ← hash × m + n
> {code}
> and in Guava's Murmur3_32HashFunction.java:
> {code}
> h1 = h1 * 5 + 0xe6546b64;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3623) Adjust MurmurHash algorithm

2016-04-04 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-3623.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed via 641a0d436c9b7a34ff33ceb370cf29962cac4dee


> Adjust MurmurHash algorithm
> ---
>
> Key: FLINK-3623
> URL: https://issues.apache.org/jira/browse/FLINK-3623
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.1.0
>
>
> Flink's MurmurHash implementation differs from the published algorithm.
> From Flink's MathUtils.java:
> {code}
> code *= 0xe6546b64;
> {code}
> The Murmur3_32 algorithm as described by 
> [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]:
> {code}
> m ← 5
> n ← 0xe6546b64
> hash ← hash × m + n
> {code}
> and in Guava's Murmur3_32HashFunction.java:
> {code}
> h1 = h1 * 5 + 0xe6546b64;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1159) Case style anonymous functions not supported by Scala API

2016-04-04 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1159.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed via 5cb84f185963fa89be5d0c4e83bad66bac44d84d

Thank you for the contribution!

> Case style anonymous functions not supported by Scala API
> -
>
> Key: FLINK-1159
> URL: https://issues.apache.org/jira/browse/FLINK-1159
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Till Rohrmann
>Assignee: Stefano Baghino
> Fix For: 1.1.0
>
>
> In Scala it is very common to define anonymous functions of the following form
> {code}
> {
> case foo: Bar => foobar(foo)
> case _ => throw new RuntimeException()
> }
> {code}
> These case style anonymous functions are not supported yet by the Scala API. 
> Thus, one has to write redundant code to name the function parameter.
> What works is the following pattern, but it is not intuitive for someone 
> coming from Scala:
> {code}
> dataset.map{
>   _ match{
> case foo:Bar => ...
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1159) Case style anonymous functions not supported by Scala API

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224884#comment-15224884
 ] 

ASF GitHub Bot commented on FLINK-1159:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1704


> Case style anonymous functions not supported by Scala API
> -
>
> Key: FLINK-1159
> URL: https://issues.apache.org/jira/browse/FLINK-1159
> Project: Flink
>  Issue Type: Bug
>  Components: Scala API
>Reporter: Till Rohrmann
>Assignee: Stefano Baghino
> Fix For: 1.1.0
>
>
> In Scala it is very common to define anonymous functions of the following form
> {code}
> {
> case foo: Bar => foobar(foo)
> case _ => throw new RuntimeException()
> }
> {code}
> These case style anonymous functions are not supported yet by the Scala API. 
> Thus, one has to write redundant code to name the function parameter.
> What works is the following pattern, but it is not intuitive for someone 
> coming from Scala:
> {code}
> dataset.map{
>   _ match{
> case foo:Bar => ...
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3623) Adjust MurmurHash algorithm

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224883#comment-15224883
 ] 

ASF GitHub Bot commented on FLINK-3623:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1806


> Adjust MurmurHash algorithm
> ---
>
> Key: FLINK-3623
> URL: https://issues.apache.org/jira/browse/FLINK-3623
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
>
> Flink's MurmurHash implementation differs from the published algorithm.
> From Flink's MathUtils.java:
> {code}
> code *= 0xe6546b64;
> {code}
> The Murmur3_32 algorithm as described by 
> [Wikipedia|https://en.wikipedia.org/wiki/MurmurHash]:
> {code}
> m ← 5
> n ← 0xe6546b64
> hash ← hash × m + n
> {code}
> and in Guava's Murmur3_32HashFunction.java:
> {code}
> h1 = h1 * 5 + 0xe6546b64;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3623] [runtime] Adjust MurmurHash Algor...

2016-04-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1806


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1159] Case style anonymous functions no...

2016-04-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1704


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Updates the AssignerWithPunctuatedWatermarks a...

2016-04-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1811


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission

2016-04-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3595.
--
Resolution: Fixed

Fixed in 3465580 (master), e0dc5c1 (release-1.0).

> Kafka09 consumer thread does not interrupt when stuck in record emission
> 
>
> Key: FLINK-3595
> URL: https://issues.apache.org/jira/browse/FLINK-3595
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.1.0, 1.0.1
>
>
> When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a 
> blocking method (output emitting) and never wakes up.
> The thread as a whole cannot be simply interrupted, because of a bug in Kafka 
> that makes the consumer freeze/hang up on interrupt.
> There are two possible solutions:
>   - allow and call interrupt when the consumer thread is emitting elements
>   - destroy the output network buffer pools eagerly on canceling. The Kafka 
> thread will then throw an exception if it is stuck in emitting elements and 
> it will terminate, which is accepted in case the status is canceled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224872#comment-15224872
 ] 

ASF GitHub Bot commented on FLINK-3595:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1780


> Kafka09 consumer thread does not interrupt when stuck in record emission
> 
>
> Key: FLINK-3595
> URL: https://issues.apache.org/jira/browse/FLINK-3595
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.1.0, 1.0.1
>
>
> When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a 
> blocking method (output emitting) and never wakes up.
> The thread as a whole cannot be simply interrupted, because of a bug in Kafka 
> that makes the consumer freeze/hang up on interrupt.
> There are two possible solutions:
>   - allow and call interrupt when the consumer thread is emitting elements
>   - destroy the output network buffer pools eagerly on canceling. The Kafka 
> thread will then throw an exception if it is stuck in emitting elements and 
> it will terminate, which is accepted in case the status is canceled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3595] [runtime] Eagerly destroy buffer ...

2016-04-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1780


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3697) keyBy() with nested POJO computes invalid field position indexes

2016-04-04 Thread Ron Crocker (JIRA)
Ron Crocker created FLINK-3697:
--

 Summary: keyBy() with nested POJO computes invalid field position 
indexes
 Key: FLINK-3697
 URL: https://issues.apache.org/jira/browse/FLINK-3697
 Project: Flink
  Issue Type: Bug
  Components: DataStream API
Affects Versions: 1.0.0
 Environment: MacOS X 10.10
Reporter: Ron Crocker
Priority: Minor


Using named keys in keyBy() for nested POJO types results in failure. The 
iindexes for named key fields are used inconsistently with nested POJO types. 
In particular, {{PojoTypeInfo.getFlatFields()}} returns the field's position 
after (apparently) flattening the structure but is referenced in the 
unflattened version of the POJO type by {{PojoTypeInfo.getTypeAt()}}.

In the example below, getFlatFields() returns positions 0, 1, and 14. These 
positions appear correct in the flattened structure of the Data class. However, 
in {{KeySelector getSelectorForKeys(Keys keys, TypeInformation 
typeInfo, ExecutionConfig executionConfig)}}, a call to 
{{compositeType.getTypeAt(logicalKeyPositions[i])}} for the third key results 
{{PojoTypeInfo.getTypeAt()}} declaring it out of range, as it compares the 
length of the directly named fields of the object vs the length of flattened 
version of that type.

Concrete Example:
Consider this graph:
{code}
DataStream dataStream = see.addSource(new 
FlinkKafkaConsumer08<>(timesliceConstants.topic, new DataDeserialzer(), 
kafkaConsumerProperties));

dataStream
  .flatMap(new DataMapper())
  .keyBy("aaa", "abc", "wxyz")
{code}

{{DataDeserialzer}} returns a "NativeDataFormat" object; {{DataMapper}} takes 
this NativeDataFormat object and extracts individual Data objects: {code}
public class Data {
public int aaa;
public int abc;
public long wxyz;
public int t1;
public int t2;
public Policy policy;
public Stats stats;

public Data() {}
{code}

A {{Policy}} object is an instance of this class:
{code}
public class AggregatableMetricStoragePolicy implements MetricStoragePolicy {
public short a;
public short b;
public boolean c;
public boolean d;

public Policy() {}
}
{code}

A {{Stats}} object is an instance of this class:
{code}
public class Stats {
public long count;
public float a;
public float b;
public float c;
public float d;
public float e;

public Stats() {}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224812#comment-15224812
 ] 

ASF GitHub Bot commented on FLINK-2609:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205446161
  
Let's find a way then :-)


> Automatic type registration is only called from the batch execution 
> environment
> ---
>
> Key: FLINK-2609
> URL: https://issues.apache.org/jira/browse/FLINK-2609
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> Kryo types in the streaming API are quite expensive to serialize because they 
> are not automatically registered at Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types

2016-04-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205446161
  
Let's find a way then :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input

2016-04-04 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224692#comment-15224692
 ] 

Jamie Grier edited comment on FLINK-3679 at 4/4/16 6:12 PM:


I'm not sure about the locking and operator chaining issues so I would say if 
that's unduly complicated because of this change maybe it's not worth it.  
However, a DeserializationSchema with more flatMap() like semantics would 
certainly be the better API given that bad data issues are a reality.  It also 
seems we could provide this without breaking existing code, but certainly it 
would add a bit more complexity to the API (having multiple variants for this).

Anyway, I agree you can work around this issue my making a special "sentinel" 
value and dealing with all of this is in a chained flatMap() operator.  I 
imagine that's exactly the approach that people are already using.




was (Author: jgrier):
I'm not sure about the locking and operator chaining issues so I would say if 
that's unduly complicated because of this change maybe it's not worth it.  
However, a DeserializationSchema with more flatMap() like semantics would 
certainly the better API given that bad data issues are a reality.  It also 
seems we could provide this without breaking existing code, but certainly it 
would add a bit more complexity to the API (having multiple variants for this).

Anyway, I agree you can work around this issue my making a special "sentinel" 
value and dealing with all of this is in a chained flatMap() operator.  I 
imagine that's exactly the approach that people are already using.



> DeserializationSchema should handle zero or more outputs for every input
> 
>
> Key: FLINK-3679
> URL: https://issues.apache.org/jira/browse/FLINK-3679
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Reporter: Jamie Grier
>
> There are a couple of issues with the DeserializationSchema API that I think 
> should be improved.  This request has come to me via an existing Flink user.
> The main issue is simply that the API assumes that there is a one-to-one 
> mapping between input and outputs.  In reality there are scenarios where one 
> input message (say from Kafka) might actually map to zero or more logical 
> elements in the pipeline.
> Particularly important here is the case where you receive a message from a 
> source (such as Kafka) and say the raw bytes don't deserialize properly.  
> Right now the only recourse is to throw IOException and therefore fail the 
> job.  
> This is definitely not good since bad data is a reality and failing the job 
> is not the right option.  If the job fails we'll just end up replaying the 
> bad data and the whole thing will start again.
> Instead in this case it would be best if the user could just return the empty 
> set.
> The other case is where one input message should logically be multiple output 
> messages.  This case is probably less important since there are other ways to 
> do this but in general it might be good to make the 
> DeserializationSchema.deserialize() method return a collection rather than a 
> single element.
> Maybe we need to support a DeserializationSchema variant that has semantics 
> more like that of FlatMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input

2016-04-04 Thread Jamie Grier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224692#comment-15224692
 ] 

Jamie Grier commented on FLINK-3679:


I'm not sure about the locking and operator chaining issues so I would say if 
that's unduly complicated because of this change maybe it's not worth it.  
However, a DeserializationSchema with more flatMap() like semantics would 
certainly the better API given that bad data issues are a reality.  It also 
seems we could provide this without breaking existing code, but certainly it 
would add a bit more complexity to the API (having multiple variants for this).

Anyway, I agree you can work around this issue my making a special "sentinel" 
value and dealing with all of this is in a chained flatMap() operator.  I 
imagine that's exactly the approach that people are already using.



> DeserializationSchema should handle zero or more outputs for every input
> 
>
> Key: FLINK-3679
> URL: https://issues.apache.org/jira/browse/FLINK-3679
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Reporter: Jamie Grier
>
> There are a couple of issues with the DeserializationSchema API that I think 
> should be improved.  This request has come to me via an existing Flink user.
> The main issue is simply that the API assumes that there is a one-to-one 
> mapping between input and outputs.  In reality there are scenarios where one 
> input message (say from Kafka) might actually map to zero or more logical 
> elements in the pipeline.
> Particularly important here is the case where you receive a message from a 
> source (such as Kafka) and say the raw bytes don't deserialize properly.  
> Right now the only recourse is to throw IOException and therefore fail the 
> job.  
> This is definitely not good since bad data is a reality and failing the job 
> is not the right option.  If the job fails we'll just end up replaying the 
> bad data and the whole thing will start again.
> Instead in this case it would be best if the user could just return the empty 
> set.
> The other case is where one input message should logically be multiple output 
> messages.  This case is probably less important since there are other ways to 
> do this but in general it might be good to make the 
> DeserializationSchema.deserialize() method return a collection rather than a 
> single element.
> Maybe we need to support a DeserializationSchema variant that has semantics 
> more like that of FlatMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector

2016-04-04 Thread Farouk Salem (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224633#comment-15224633
 ] 

Farouk Salem commented on FLINK-3343:
-

I use the default values.

> Exception while using Kafka 0.9 connector 
> --
>
> Key: FLINK-3343
> URL: https://issues.apache.org/jira/browse/FLINK-3343
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib, Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Farouk Salem
>
> While running a job, without fault tolerance, producing data to Kafka, the 
> job failed due to "Batch Expired exception". I tried to increase the 
> "request.timeout.ms" and "max.block.ms"  to 6 instead of 3 but still 
> the same problem. The only way to ride on this problem is using snapshotting.
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48106 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48105 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48104 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask
>- Caught exception while processing timer.
> java.lang.RuntimeException: Could not forward element to next operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Could not forward element to next 
> operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316)
> ... 15 more
> Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:249)
> at 
> 

[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector

2016-04-04 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224620#comment-15224620
 ] 

Robert Metzger commented on FLINK-3343:
---

How much heap size do you allocate per task manager?

> Exception while using Kafka 0.9 connector 
> --
>
> Key: FLINK-3343
> URL: https://issues.apache.org/jira/browse/FLINK-3343
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib, Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Farouk Salem
>
> While running a job, without fault tolerance, producing data to Kafka, the 
> job failed due to "Batch Expired exception". I tried to increase the 
> "request.timeout.ms" and "max.block.ms"  to 6 instead of 3 but still 
> the same problem. The only way to ride on this problem is using snapshotting.
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48106 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48105 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48104 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask
>- Caught exception while processing timer.
> java.lang.RuntimeException: Could not forward element to next operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Could not forward element to next 
> operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316)
> ... 15 more
> Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:249)
> at 
> 

[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224605#comment-15224605
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-205412831
  
Do you prefer the look of hand-crafted graphs or should we consider 
something that will draw graphs in the browser like 
[JSNetworkX](http://jsnetworkx.org/index.html)? This is BSD licensed. They have 
yet to fully implement NetworkX but these graph generators map to generators in 
NetworkX.


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector

2016-04-04 Thread Farouk Salem (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224606#comment-15224606
 ] 

Farouk Salem commented on FLINK-3343:
-

yeah, it works most of the time. But, sometimes it throws "GC overhead limit 
exceeded" exception.

> Exception while using Kafka 0.9 connector 
> --
>
> Key: FLINK-3343
> URL: https://issues.apache.org/jira/browse/FLINK-3343
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib, Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Farouk Salem
>
> While running a job, without fault tolerance, producing data to Kafka, the 
> job failed due to "Batch Expired exception". I tried to increase the 
> "request.timeout.ms" and "max.block.ms"  to 6 instead of 3 but still 
> the same problem. The only way to ride on this problem is using snapshotting.
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48106 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48105 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48104 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask
>- Caught exception while processing timer.
> java.lang.RuntimeException: Could not forward element to next operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Could not forward element to next 
> operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316)
> ... 15 more
> Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282)
> at 
> 

[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-04 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-205412831
  
Do you prefer the look of hand-crafted graphs or should we consider 
something that will draw graphs in the browser like 
[JSNetworkX](http://jsnetworkx.org/index.html)? This is BSD licensed. They have 
yet to fully implement NetworkX but these graph generators map to generators in 
NetworkX.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3343) Exception while using Kafka 0.9 connector

2016-04-04 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224598#comment-15224598
 ] 

Robert Metzger commented on FLINK-3343:
---

Have you tried setting a batch size of 0 ?

> Exception while using Kafka 0.9 connector 
> --
>
> Key: FLINK-3343
> URL: https://issues.apache.org/jira/browse/FLINK-3343
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib, Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Farouk Salem
>
> While running a job, without fault tolerance, producing data to Kafka, the 
> job failed due to "Batch Expired exception". I tried to increase the 
> "request.timeout.ms" and "max.block.ms"  to 6 instead of 3 but still 
> the same problem. The only way to ride on this problem is using snapshotting.
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48106 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48105 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,036 WARN  org.apache.kafka.clients.producer.internals.Sender 
>- Got error produce response with correlation id 48104 on topic-partition 
> flinkWordCountNoFaultToleranceSmall
> -2, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION
> 09:58:11,068 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask
>- Caught exception while processing timer.
> java.lang.RuntimeException: Could not forward element to next operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AggregatingKeyedTimePanes.evaluateWindow(AggregatingKeyedTimePanes.java:59)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.computeWindow(AbstractAlignedProcessingTimeWindowOperator.java:242)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.AbstractAlignedProcessingTimeWindowOperator.trigger(AbstractAlignedProcessingTimeWindowOperator.java:223)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$TriggerTask.run(StreamTask.java:606)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Could not forward element to next 
> operator
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:319)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:300)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:48)
> at 
> org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:29)
> at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:316)
> ... 15 more
> Caused by: java.lang.Exception: Failed to send data to Kafka: Batch Expired
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:282)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:249)
> at 
> 

[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224566#comment-15224566
 ] 

ASF GitHub Bot commented on FLINK-3654:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205406051
  
done :smile: 


> Disable Write-Ahead-Log in RocksDB State
> 
>
> Key: FLINK-3654
> URL: https://issues.apache.org/jira/browse/FLINK-3654
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We do our own checkpointing of the RocksDB database so the WAL is useless to 
> us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...

2016-04-04 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205406051
  
done :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224550#comment-15224550
 ] 

ASF GitHub Bot commented on FLINK-3654:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205402626
  
Would you mind triggering a few builds over night?


> Disable Write-Ahead-Log in RocksDB State
> 
>
> Key: FLINK-3654
> URL: https://issues.apache.org/jira/browse/FLINK-3654
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We do our own checkpointing of the RocksDB database so the WAL is useless to 
> us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...

2016-04-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205402626
  
Would you mind triggering a few builds over night?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224535#comment-15224535
 ] 

ASF GitHub Bot commented on FLINK-3654:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205398515
  
I'm seeing this failure on Travis: 
https://travis-ci.org/aljoscha/flink/jobs/120638971

Could be a bug introduced by the change. I'm investigating...


> Disable Write-Ahead-Log in RocksDB State
> 
>
> Key: FLINK-3654
> URL: https://issues.apache.org/jira/browse/FLINK-3654
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We do our own checkpointing of the RocksDB database so the WAL is useless to 
> us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...

2016-04-04 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205398515
  
I'm seeing this failure on Travis: 
https://travis-ci.org/aljoscha/flink/jobs/120638971

Could be a bug introduced by the change. I'm investigating...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3174) Add merging WindowAssigner

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224521#comment-15224521
 ] 

ASF GitHub Bot commented on FLINK-3174:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1802#issuecomment-205394604
  
No, we introduced the `InternalWindowFunction` before 1.0.0 to decouple the 
user `WindowFunction` from the window operator implementation. There are now 
special `InternalWindowFunctions` that don't forward the key to the user 
function. The API for the user does not change, there an `AllWindowFunction` is 
used, as before. And that function does not get a key.


> Add merging WindowAssigner
> --
>
> Key: FLINK-3174
> URL: https://issues.apache.org/jira/browse/FLINK-3174
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> We should add the possibility for WindowAssigners to merge windows. This will 
> enable Session windowing support, similar to how Google Cloud Dataflow 
> supports.
> For session windows, each element would initially be assigned to its own 
> window. When triggering we check the windows and see if any can be merged. 
> This way, elements with overlapping session windows merge into one session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3174] Add MergingWindowAssigner and Ses...

2016-04-04 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1802#issuecomment-205394604
  
No, we introduced the `InternalWindowFunction` before 1.0.0 to decouple the 
user `WindowFunction` from the window operator implementation. There are now 
special `InternalWindowFunctions` that don't forward the key to the user 
function. The API for the user does not change, there an `AllWindowFunction` is 
used, as before. And that function does not get a key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224518#comment-15224518
 ] 

ASF GitHub Bot commented on FLINK-2609:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205393404
  
I'm not sure if we can register the KeySelector types that way.


> Automatic type registration is only called from the batch execution 
> environment
> ---
>
> Key: FLINK-2609
> URL: https://issues.apache.org/jira/browse/FLINK-2609
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> Kryo types in the streaming API are quite expensive to serialize because they 
> are not automatically registered at Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types

2016-04-04 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205393404
  
I'm not sure if we can register the KeySelector types that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request:

2016-04-04 Thread aljoscha
Github user aljoscha commented on the pull request:


https://github.com/apache/flink/commit/76968c6360c17d5deb4e42727c16bc1b9a891b26#commitcomment-16956214
  
Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224507#comment-15224507
 ] 

ASF GitHub Bot commented on FLINK-3654:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205391579
  
Can do, yes. But we shouldn't block, as you said.


> Disable Write-Ahead-Log in RocksDB State
> 
>
> Key: FLINK-3654
> URL: https://issues.apache.org/jira/browse/FLINK-3654
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We do our own checkpointing of the RocksDB database so the WAL is useless to 
> us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...

2016-04-04 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205391579
  
Can do, yes. But we shouldn't block, as you said.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3174) Add merging WindowAssigner

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224479#comment-15224479
 ] 

ASF GitHub Bot commented on FLINK-3174:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1802#issuecomment-205384005
  
Concerning the removal of the non-keyed window operator: Does that mean 
that the user functions now see the dummy key?


> Add merging WindowAssigner
> --
>
> Key: FLINK-3174
> URL: https://issues.apache.org/jira/browse/FLINK-3174
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> We should add the possibility for WindowAssigners to merge windows. This will 
> enable Session windowing support, similar to how Google Cloud Dataflow 
> supports.
> For session windows, each element would initially be assigned to its own 
> window. When triggering we check the windows and see if any can be merged. 
> This way, elements with overlapping session windows merge into one session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3174] Add MergingWindowAssigner and Ses...

2016-04-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1802#issuecomment-205384005
  
Concerning the removal of the non-keyed window operator: Does that mean 
that the user functions now see the dummy key?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224468#comment-15224468
 ] 

ASF GitHub Bot commented on FLINK-2609:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205380553
  
I think static variable is even more problematic.
Other possibility is to traverse the graph at the very end, and register at 
that point what is encountered.


> Automatic type registration is only called from the batch execution 
> environment
> ---
>
> Key: FLINK-2609
> URL: https://issues.apache.org/jira/browse/FLINK-2609
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> Kryo types in the streaming API are quite expensive to serialize because they 
> are not automatically registered at Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types

2016-04-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205380553
  
I think static variable is even more problematic.
Other possibility is to traverse the graph at the very end, and register at 
that point what is encountered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request:

2016-04-04 Thread uce
Github user uce commented on the pull request:


https://github.com/apache/flink/commit/76968c6360c17d5deb4e42727c16bc1b9a891b26#commitcomment-16955170
  
Can you please cherrypick this to `release-1.0` as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3589) Allow setting Operator parallelism to default value

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224388#comment-15224388
 ] 

ASF GitHub Bot commented on FLINK-3589:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1778#issuecomment-205368211
  
Perhaps there should be two flags, PARALLELISM_DEFAULT = -1 to reset the 
parallelism to the system default, and PARALLELISM_UNKNOWN = -2 which leaves 
the parallelism unchanged.


> Allow setting Operator parallelism to default value
> ---
>
> Key: FLINK-3589
> URL: https://issues.apache.org/jira/browse/FLINK-3589
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> User's can override the parallelism for a single operator by calling 
> {{Operator.setParallelism}}, which accepts a positive value. {{Operator}} 
> uses {{-1}} to indicate default parallelism. It would be nice to name and 
> accept this default value.
> This would enable user algorithms to allow configurable parallelism while 
> still chaining operator methods.
> For example, currently:
> {code}
>   private int parallelism;
>   ...
>   public void setParallelism(int parallelism) {
>   this.parallelism = parallelism;
>   }
>   ...
>   MapOperator, Edge> newEdges = 
> edges
>   .map(new MyMapFunction())
>   .name("My map function");
>   if (parallelism > 0) {
>   newEdges.setParallelism(parallelism);
>   }
> {code}
> Could be simplified to:
> {code}
>   private int parallelism = Operator.DEFAULT_PARALLELISM;
>   ...
>   public void setParallelism(int parallelism) {
>   this.parallelism = parallelism;
>   }
>   ...
>   DataSet> newEdges = edges
>   .map(new MyMapFunction())
>   .setParallelism(parallelism)
>   .name("My map function");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3589] Allow setting Operator parallelis...

2016-04-04 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1778#issuecomment-205368211
  
Perhaps there should be two flags, PARALLELISM_DEFAULT = -1 to reset the 
parallelism to the system default, and PARALLELISM_UNKNOWN = -2 which leaves 
the parallelism unchanged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3696) Some Union tests fail for TableConfigMode.EFFICIENT

2016-04-04 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-3696:


 Summary: Some Union tests fail for TableConfigMode.EFFICIENT
 Key: FLINK-3696
 URL: https://issues.apache.org/jira/browse/FLINK-3696
 Project: Flink
  Issue Type: Bug
  Components: Table API
Affects Versions: 1.1.0
Reporter: Vasia Kalavri


e.g. testUnionWithFilter gives the following exception:
{code}
org.apache.flink.api.common.InvalidProgramException: Cannot union inputs of 
different types. Input1=scala.Tuple3(_1: Integer, _2: Long, _3: String), 
input2=Java Tuple3
at 
org.apache.flink.api.java.operators.UnionOperator.(UnionOperator.java:47)
at org.apache.flink.api.java.DataSet.union(DataSet.java:1208)
at 
org.apache.flink.api.table.plan.nodes.dataset.DataSetUnion.translateToPlan(DataSetUnion.scala:81)
at 
org.apache.flink.api.table.plan.nodes.dataset.DataSetCalc.translateToPlan(DataSetCalc.scala:95)
at 
org.apache.flink.api.java.table.JavaBatchTranslator.translate(JavaBatchTranslator.scala:91)
at 
org.apache.flink.api.scala.table.ScalaBatchTranslator.translate(ScalaBatchTranslator.scala:51)
at 
org.apache.flink.api.scala.table.TableConversions.toDataSet(TableConversions.scala:43)
at 
org.apache.flink.api.scala.table.test.UnionITCase.testUnionWithFilter(UnionITCase.scala:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3619) SavepointCoordinator test failure

2016-04-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3619.
--
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed in 138dc8f.

> SavepointCoordinator test failure
> -
>
> Key: FLINK-3619
> URL: https://issues.apache.org/jira/browse/FLINK-3619
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Gyula Fora
>Assignee: Ufuk Celebi
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> I ecnountered the following test failure in a recent build on travis:
> SavepointCoordinatorTest.testAbortOnCheckpointTimeout:517 expected:<0> but 
> was:<1>
> Full log: 
> https://api.travis-ci.org/jobs/116334023/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3576) Upgrade Snappy Java to 1.1.2.1

2016-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved FLINK-3576.
---
Resolution: Not A Problem

> Upgrade Snappy Java to 1.1.2.1
> --
>
> Key: FLINK-3576
> URL: https://issues.apache.org/jira/browse/FLINK-3576
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> There was a JVM memory leaky problem reported in 
> https://github.com/xerial/snappy-java/issues/131
> The above issue has been resolved.
> 1.1.2.1 was released on Jan 22nd.
> We should upgrade to this release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224362#comment-15224362
 ] 

ASF GitHub Bot commented on FLINK-3654:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205359909
  
Should we add this for `1.0.2` (or `1.0.1` if the RC gets cancelled)?


> Disable Write-Ahead-Log in RocksDB State
> 
>
> Key: FLINK-3654
> URL: https://issues.apache.org/jira/browse/FLINK-3654
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We do our own checkpointing of the RocksDB database so the WAL is useless to 
> us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...

2016-04-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1850#issuecomment-205359909
  
Should we add this for `1.0.2` (or `1.0.1` if the RC gets cancelled)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat

2016-04-04 Thread Tian, Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224354#comment-15224354
 ] 

Tian, Li edited comment on FLINK-3655 at 4/4/16 3:44 PM:
-

I think we may need to use "List filePaths" instead of "Path filePath" in 
FileInputFormat.
In this way, we should also
1. modify current implementations to support multiple input paths
2. add functions like setFilePaths, getFilePaths to FileInputFormat, and 
support comma-seperated Path string in ExecutionEnvironment
3. for backward compatibility, let FileInputFormat.setFilePath set the 
inputPaths to a one-element list 


was (Author: tianli):
I think we may need to use List instead of a single Path in 
FileInputFormat.
In this way, we should also
1. modify current implementations to support multiple input paths
2. add functions like setFilePaths, getFilePaths to FileInputFormat, and 
support comma-seperated Path string in ExecutionEnvironment
3. for backward compatibility, let FileInputFormat.setFilePath set the 
inputPaths to a one-element list 

> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat
> -
>
> Key: FLINK-3655
> URL: https://issues.apache.org/jira/browse/FLINK-3655
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Gna Phetsarath
>Priority: Minor
>  Labels: starter
>
> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat so that a DataSource will process the directories 
> sequentially.
>
> env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")
> in Scala
>env.readFile(paths: Seq[String])
> or 
>   env.readFile(path: String, otherPaths: String*)
> Wildcard support would be a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat

2016-04-04 Thread Tian, Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224354#comment-15224354
 ] 

Tian, Li commented on FLINK-3655:
-

I think we may need to use List instead of a single Path in 
FileInputFormat.
In this way, we should also
1. modify current implementations to support multiple input paths
2. add functions like setFilePaths, getFilePaths to FileInputFormat, and 
support comma-seperated Path string in ExecutionEnvironment
3. for backward compatibility, let FileInputFormat.setFilePath set the 
inputPaths to a one-element list 

> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat
> -
>
> Key: FLINK-3655
> URL: https://issues.apache.org/jira/browse/FLINK-3655
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Gna Phetsarath
>Priority: Minor
>  Labels: starter
>
> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat so that a DataSource will process the directories 
> sequentially.
>
> env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")
> in Scala
>env.readFile(paths: Seq[String])
> or 
>   env.readFile(path: String, otherPaths: String*)
> Wildcard support would be a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3589) Allow setting Operator parallelism to default value

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224328#comment-15224328
 ] 

ASF GitHub Bot commented on FLINK-3589:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1778#discussion_r58394697
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java ---
@@ -64,10 +64,15 @@
 
/**
 * The constant to use for the parallelism, if the system should use 
the number
-*  of currently available slots.
+* of currently available slots.
 */
public static final int PARALLELISM_AUTO_MAX = Integer.MAX_VALUE;
 
+   /**
+* The flag value indicating an unknown or unset parallelism.
+*/
+   public static final int PARALLELISM_UNKNOWN = -1;
--- End diff --

The parallelism may be overridden elsewhere and we are not resetting it 
back to the default. This flag simply indicates a value for which the 
parallelism will not be overridden.


> Allow setting Operator parallelism to default value
> ---
>
> Key: FLINK-3589
> URL: https://issues.apache.org/jira/browse/FLINK-3589
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> User's can override the parallelism for a single operator by calling 
> {{Operator.setParallelism}}, which accepts a positive value. {{Operator}} 
> uses {{-1}} to indicate default parallelism. It would be nice to name and 
> accept this default value.
> This would enable user algorithms to allow configurable parallelism while 
> still chaining operator methods.
> For example, currently:
> {code}
>   private int parallelism;
>   ...
>   public void setParallelism(int parallelism) {
>   this.parallelism = parallelism;
>   }
>   ...
>   MapOperator, Edge> newEdges = 
> edges
>   .map(new MyMapFunction())
>   .name("My map function");
>   if (parallelism > 0) {
>   newEdges.setParallelism(parallelism);
>   }
> {code}
> Could be simplified to:
> {code}
>   private int parallelism = Operator.DEFAULT_PARALLELISM;
>   ...
>   public void setParallelism(int parallelism) {
>   this.parallelism = parallelism;
>   }
>   ...
>   DataSet> newEdges = edges
>   .map(new MyMapFunction())
>   .setParallelism(parallelism)
>   .name("My map function");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3589] Allow setting Operator parallelis...

2016-04-04 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1778#discussion_r58394697
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java ---
@@ -64,10 +64,15 @@
 
/**
 * The constant to use for the parallelism, if the system should use 
the number
-*  of currently available slots.
+* of currently available slots.
 */
public static final int PARALLELISM_AUTO_MAX = Integer.MAX_VALUE;
 
+   /**
+* The flag value indicating an unknown or unset parallelism.
+*/
+   public static final int PARALLELISM_UNKNOWN = -1;
--- End diff --

The parallelism may be overridden elsewhere and we are not resetting it 
back to the default. This flag simply indicates a value for which the 
parallelism will not be overridden.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224320#comment-15224320
 ] 

ASF GitHub Bot commented on FLINK-2609:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205346744
  
i decided against the ExecutionEnvironment because i do not want to add a 
method (to access the deduplicator) in a user-facing class, when a user would 
never use it.


> Automatic type registration is only called from the batch execution 
> environment
> ---
>
> Key: FLINK-2609
> URL: https://issues.apache.org/jira/browse/FLINK-2609
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> Kryo types in the streaming API are quite expensive to serialize because they 
> are not automatically registered at Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types

2016-04-04 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205346744
  
i decided against the ExecutionEnvironment because i do not want to add a 
method (to access the deduplicator) in a user-facing class, when a user would 
never use it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3576) Upgrade Snappy Java to 1.1.2.1

2016-04-04 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224304#comment-15224304
 ] 

Stephan Ewen commented on FLINK-3576:
-

Do we actually have an explicit snappy dependency somewhere?
I can find only transitive dependencies at the first glance...

> Upgrade Snappy Java to 1.1.2.1
> --
>
> Key: FLINK-3576
> URL: https://issues.apache.org/jira/browse/FLINK-3576
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> There was a JVM memory leaky problem reported in 
> https://github.com/xerial/snappy-java/issues/131
> The above issue has been resolved.
> 1.1.2.1 was released on Jan 22nd.
> We should upgrade to this release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3695) ValueArray types

2016-04-04 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3695:
-

 Summary: ValueArray types
 Key: FLINK-3695
 URL: https://issues.apache.org/jira/browse/FLINK-3695
 Project: Flink
  Issue Type: New Feature
  Components: Core
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


Flink provides mutable {{Value}} type implementations of Java primitives along 
with efficient serializers and comparators. It would be useful to have 
corresponding {{ValueArray}} implementations backed by primitive rather than 
object arrays, along with an {{ArrayableValue}} interface tying a {{Value}} to 
its {{ValueArray}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Updates the AssignerWithPunctuatedWatermarks a...

2016-04-04 Thread kl0u
Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/1811#issuecomment-205341434
  
No problem @StephanEwen ! 
Thanks for merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3654) Disable Write-Ahead-Log in RocksDB State

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224295#comment-15224295
 ] 

ASF GitHub Bot commented on FLINK-3654:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/1850

[FLINK-3654] Disable Write-Ahead-Log in RocksDB State



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink rocksdb-no-wal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1850


commit 0580de64ac36bd73f505084ffac44e512f127da2
Author: Aljoscha Krettek 
Date:   2016-03-22T17:08:15Z

[FLINK-3654] Disable Write-Ahead-Log in RocksDB State




> Disable Write-Ahead-Log in RocksDB State
> 
>
> Key: FLINK-3654
> URL: https://issues.apache.org/jira/browse/FLINK-3654
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> We do our own checkpointing of the RocksDB database so the WAL is useless to 
> us. Disabling writes to the WAL should give us a very large performance boost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3654] Disable Write-Ahead-Log in RocksD...

2016-04-04 Thread aljoscha
GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/1850

[FLINK-3654] Disable Write-Ahead-Log in RocksDB State



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink rocksdb-no-wal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1850


commit 0580de64ac36bd73f505084ffac44e512f127da2
Author: Aljoscha Krettek 
Date:   2016-03-22T17:08:15Z

[FLINK-3654] Disable Write-Ahead-Log in RocksDB State




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2674) Rework windowing logic

2016-04-04 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-2674.
-
Resolution: Fixed

All sub-issues are done.

> Rework windowing logic
> --
>
> Key: FLINK-2674
> URL: https://issues.apache.org/jira/browse/FLINK-2674
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Stephan Ewen
>Priority: Critical
> Fix For: 1.0.0
>
>
> The windowing logic needs a major overhaul. This follows the design 
> documents: 
>   - 
> https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams
>   - https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=60624830
> Specifically, the following shortcomings need to be addressed:
>   - Global parallel windows should be dropped
>-> for time, local windows are aligned and serve the same purpose
>-> there is currently no known robust and efficient parallel 
> implementation of custom strategies 
>   - Event time and out of order arrival needs to be supported
>   - Eviction of not accessed keys does not work. Non-accessed keys linger 
> infinitely
>   - Performance is currently bad for time windows, due to a overly general 
> implementation
>   - Resources are leaking, threads are not shut down
>   - Elements are stored multiple times (discretizers, window buffers)
>   - Finally, many implementations are buggy, produce wrong results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2870) Add support for accumulating/discarding for Event-Time Windows

2016-04-04 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-2870.
---
Resolution: Duplicate

This will be subsumed by the work in FLINK-3643

> Add support for accumulating/discarding for Event-Time Windows
> --
>
> Key: FLINK-2870
> URL: https://issues.apache.org/jira/browse/FLINK-2870
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This would allow to specify whether windows should be discarded after the 
> trigger fires or kept in the operator if late elements arrive.
> When keeping elements, the user would also have to specify an allowed 
> lateness time after which the window contents are discarded without emitting 
> any further window evaluation result.
> If elements arrive after the allowed lateness they would trigger the window 
> immediately with only the one single element.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224282#comment-15224282
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-205332950
  
Hi @greghogan, it's google drawings.


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-04 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-205332950
  
Hi @greghogan, it's google drawings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3576) Upgrade Snappy Java to 1.1.2.1

2016-04-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-3576:
--
Description: 
There was a JVM memory leaky problem reported in 
https://github.com/xerial/snappy-java/issues/131

The above issue has been resolved.
1.1.2.1 was released on Jan 22nd.

We should upgrade to this release.

  was:
There was a JVM memory leaky problem reported in 
https://github.com/xerial/snappy-java/issues/131

The above issue has been resolved.
1.1.2.1 was released on Jan 22nd.


We should upgrade to this release.


> Upgrade Snappy Java to 1.1.2.1
> --
>
> Key: FLINK-3576
> URL: https://issues.apache.org/jira/browse/FLINK-3576
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> There was a JVM memory leaky problem reported in 
> https://github.com/xerial/snappy-java/issues/131
> The above issue has been resolved.
> 1.1.2.1 was released on Jan 22nd.
> We should upgrade to this release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224267#comment-15224267
 ] 

ASF GitHub Bot commented on FLINK-2909:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-205328244
  
@vasia do you know what software was used to create the graphs in the 
existing Gelly documentation?


> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3685) Logical error in code for DateSerializer deserialize with reuse

2016-04-04 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224269#comment-15224269
 ] 

Stephan Ewen commented on FLINK-3685:
-

Most serializers do not support null, and the runtime does not work with null 
values either. This is an "in case support will be added" logic that should be 
removed in my opinion. Also, because {{-1}} is actually a valid date.

> Logical error in code for DateSerializer deserialize with reuse
> ---
>
> Key: FLINK-3685
> URL: https://issues.apache.org/jira/browse/FLINK-3685
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: ZhengBowen
>
> There is a logical error in the following function in DateSerializer.java 
> when source read '-1'
> function is:
> ```
> public Date deserialize(Date reuse, DataInputView source) throws IOException {
>   long v = source.readLong();
>   if(v == -1L) {
>   return null;
>   }
>   reuse.setTime(v);
>   return reuse;
> }
> ```
> when call this function for first time, if return null, then 'reuse' will be 
> set null by caller;
> when call this function for second time,if 'v!=-1' ,reuse.setTime(v) will 
> throw NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2909] [Gelly] Graph generators

2016-04-04 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1807#issuecomment-205328244
  
@vasia do you know what software was used to create the graphs in the 
existing Gelly documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3613) Add standard deviation, mean, variance to list of Aggregations

2016-04-04 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224259#comment-15224259
 ] 

Stephan Ewen commented on FLINK-3613:
-

The design of the extended aggregators makes a lot of sense. I agree with 
Fabian that we should discuss two things first, however:

  1. Do we want such extended aggregations in the DataSet API, or basically 
push people to use the Table API instead? My gut feeling is that it makes sense 
to have this in the DataSet API if we answer (2) with "yes" have a good design 
for (3).
  2. I assume it should allow to use multiple aggregation functions, such that 
one could create something {{like (a, b) --> (max(a), min(a), avg(b))}}
  3. How do we want the signatures for this to look? Ideally making this 
typesafe via a builder (similar to the CSV input on ExecutionEnvironment).


> Add standard deviation, mean, variance to list of Aggregations
> --
>
> Key: FLINK-3613
> URL: https://issues.apache.org/jira/browse/FLINK-3613
> Project: Flink
>  Issue Type: Improvement
>Reporter: Todd Lisonbee
>Priority: Minor
> Attachments: DataSet-Aggregation-Design-March2016-v1.txt
>
>
> Implement standard deviation, mean, variance for 
> org.apache.flink.api.java.aggregation.Aggregations
> Ideally implementation should be single pass and numerically stable.
> References:
> "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et 
> al, International Conference on Data Engineering 2012
> http://dl.acm.org/citation.cfm?id=2310392
> "The Kahan summation algorithm (also known as compensated summation) reduces 
> the numerical errors that occur when adding a sequence of finite precision 
> floating point numbers. Numerical errors arise due to truncation and 
> rounding. These errors can lead to numerical instability when calculating 
> variance."
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-3613) Add standard deviation, mean, variance to list of Aggregations

2016-04-04 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224259#comment-15224259
 ] 

Stephan Ewen edited comment on FLINK-3613 at 4/4/16 2:42 PM:
-

The design of the extended aggregators makes a lot of sense. I agree with 
Fabian that we should discuss two things first, however:

  1. Do we want such extended aggregations in the DataSet API, or basically 
push people to use the Table API instead? My gut feeling is that it makes sense 
to have this in the DataSet API if we answer (2) with "yes" have a good design 
for (3).
  2. I assume it should allow to use multiple aggregation functions, such that 
one could create something like {{(a, b) --> (max(a), min(a), avg(b))}}
  3. How do we want the signatures for this to look? Ideally making this 
typesafe via a builder (similar to the CSV input on ExecutionEnvironment).



was (Author: stephanewen):
The design of the extended aggregators makes a lot of sense. I agree with 
Fabian that we should discuss two things first, however:

  1. Do we want such extended aggregations in the DataSet API, or basically 
push people to use the Table API instead? My gut feeling is that it makes sense 
to have this in the DataSet API if we answer (2) with "yes" have a good design 
for (3).
  2. I assume it should allow to use multiple aggregation functions, such that 
one could create something {{like (a, b) --> (max(a), min(a), avg(b))}}
  3. How do we want the signatures for this to look? Ideally making this 
typesafe via a builder (similar to the CSV input on ExecutionEnvironment).


> Add standard deviation, mean, variance to list of Aggregations
> --
>
> Key: FLINK-3613
> URL: https://issues.apache.org/jira/browse/FLINK-3613
> Project: Flink
>  Issue Type: Improvement
>Reporter: Todd Lisonbee
>Priority: Minor
> Attachments: DataSet-Aggregation-Design-March2016-v1.txt
>
>
> Implement standard deviation, mean, variance for 
> org.apache.flink.api.java.aggregation.Aggregations
> Ideally implementation should be single pass and numerically stable.
> References:
> "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et 
> al, International Conference on Data Engineering 2012
> http://dl.acm.org/citation.cfm?id=2310392
> "The Kahan summation algorithm (also known as compensated summation) reduces 
> the numerical errors that occur when adding a sequence of finite precision 
> floating point numbers. Numerical errors arise due to truncation and 
> rounding. These errors can lead to numerical instability when calculating 
> variance."
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3634) Fix documentation for DataSetUtils.zipWithUniqueId()

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224250#comment-15224250
 ] 

ASF GitHub Bot commented on FLINK-3634:
---

Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1817#issuecomment-205325957
  
The documentation for `zipWithIndex` now references requiring two passes 
and notes `zipWithUniqueId` as an alternative.


> Fix documentation for DataSetUtils.zipWithUniqueId()
> 
>
> Key: FLINK-3634
> URL: https://issues.apache.org/jira/browse/FLINK-3634
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.1.0, 1.0.1
>
>
> Under FLINK-2590 the assignment and testing of unique IDs was improved but 
> the documentation looks to still reference the old implementation.
> With {{parallelism=1}} there is no difference between zipWithUniqueID and 
> zipWithIndex. With greater parallelism the results of zipWithUniqueID are 
> dependent on the partitioning.
> The documentation should demonstrate a possible result that is different from 
> the incremental sequence of zipWithIndex while noting that results are 
> dependent on the parallelism and partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3685) Logical error in code for DateSerializer deserialize with reuse

2016-04-04 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224247#comment-15224247
 ] 

Robert Metzger commented on FLINK-3685:
---

Did you see any failures related to this?

> Logical error in code for DateSerializer deserialize with reuse
> ---
>
> Key: FLINK-3685
> URL: https://issues.apache.org/jira/browse/FLINK-3685
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: ZhengBowen
>
> There is a logical error in the following function in DateSerializer.java 
> when source read '-1'
> function is:
> ```
> public Date deserialize(Date reuse, DataInputView source) throws IOException {
>   long v = source.readLong();
>   if(v == -1L) {
>   return null;
>   }
>   reuse.setTime(v);
>   return reuse;
> }
> ```
> when call this function for first time, if return null, then 'reuse' will be 
> set null by caller;
> when call this function for second time,if 'v!=-1' ,reuse.setTime(v) will 
> throw NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3634] [docs] Fix documentation for Data...

2016-04-04 Thread greghogan
Github user greghogan commented on the pull request:

https://github.com/apache/flink/pull/1817#issuecomment-205325957
  
The documentation for `zipWithIndex` now references requiring two passes 
and notes `zipWithUniqueId` as an alternative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3693) JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour is unstable

2016-04-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3693.
--
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed in 9e7c664.

> JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour is 
> unstable
> --
>
> Key: FLINK-3693
> URL: https://issues.apache.org/jira/browse/FLINK-3693
> Project: Flink
>  Issue Type: Bug
>Reporter: Robert Metzger
>Assignee: Ufuk Celebi
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> {code}
> testClientNonDetachedListeningBehaviour(org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase)
>   Time elapsed: 13.834 sec  <<< ERROR!
> java.lang.IllegalStateException: Job is in terminal state FAILED, but was 
> waiting for RUNNING.
>   at 
> org.apache.flink.runtime.testutils.JobManagerActorTestUtils.waitForJobStatus(JobManagerActorTestUtils.java:83)
>   at 
> org.apache.flink.test.recovery.JobManagerHAJobGraphRecoveryITCase.testClientNonDetachedListeningBehaviour(JobManagerHAJobGraphRecoveryITCase.java:305)
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/120072682/log.txt
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/120105002/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3664) Create a method to easily Summarize a DataSet

2016-04-04 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224239#comment-15224239
 ] 

Stephan Ewen commented on FLINK-3664:
-

The summarize design code looks good so far.

Since it is part of the {{DataSetUtils}}, not breaking / altering any existing 
behavior, and it seems lightweight, it should be good to add this.

Feel free to open a pull request with this...

> Create a method to easily Summarize a DataSet
> -
>
> Key: FLINK-3664
> URL: https://issues.apache.org/jira/browse/FLINK-3664
> Project: Flink
>  Issue Type: Improvement
>Reporter: Todd Lisonbee
> Attachments: DataSet-Summary-Design-March2016-v1.txt
>
>
> Here is an example:
> {code}
> /**
>  * Summarize a DataSet of Tuples by collecting single pass statistics for all 
> columns
>  */
> public Tuple summarize()
> Dataset> input = // [...]
> Tuple3 summary 
> = input.summarize()
> summary.getField(0).stddev()
> summary.getField(1).maxStringLength()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3595) Kafka09 consumer thread does not interrupt when stuck in record emission

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224225#comment-15224225
 ] 

ASF GitHub Bot commented on FLINK-3595:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1780#issuecomment-205319812
  
Just rebased. Waiting for Travis and then merging to `master` and 
`release-1.0`.


> Kafka09 consumer thread does not interrupt when stuck in record emission
> 
>
> Key: FLINK-3595
> URL: https://issues.apache.org/jira/browse/FLINK-3595
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Ufuk Celebi
>Priority: Critical
> Fix For: 1.1.0, 1.0.1
>
>
> When canceling a job, the Kafka 0.9 Consumer Thread may be stuck in a 
> blocking method (output emitting) and never wakes up.
> The thread as a whole cannot be simply interrupted, because of a bug in Kafka 
> that makes the consumer freeze/hang up on interrupt.
> There are two possible solutions:
>   - allow and call interrupt when the consumer thread is emitting elements
>   - destroy the output network buffer pools eagerly on canceling. The Kafka 
> thread will then throw an exception if it is stuck in emitting elements and 
> it will terminate, which is accepted in case the status is canceled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3595] [runtime] Eagerly destroy buffer ...

2016-04-04 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1780#issuecomment-205319812
  
Just rebased. Waiting for Travis and then merging to `master` and 
`release-1.0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1707) Add an Affinity Propagation Library Method

2016-04-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224218#comment-15224218
 ] 

Josep Rubió commented on FLINK-1707:


Hi Vasia,

Yes, I'm working with scatter-gather iteration. I've seen the vertex-centric 
iteration and being almost the same than Giraph it could make easier migrating 
from the Okapi implementation.

It's enough now.

Thanks!

> Add an Affinity Propagation Library Method
> --
>
> Key: FLINK-1707
> URL: https://issues.apache.org/jira/browse/FLINK-1707
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Josep Rubió
>Priority: Minor
>  Labels: requires-design-doc
> Attachments: Binary_Affinity_Propagation_in_Flink_design_doc.pdf
>
>
> This issue proposes adding the an implementation of the Affinity Propagation 
> algorithm as a Gelly library method and a corresponding example.
> The algorithm is described in paper [1] and a description of a vertex-centric 
> implementation can be found is [2].
> [1]: http://www.psi.toronto.edu/affinitypropagation/FreyDueckScience07.pdf
> [2]: http://event.cwi.nl/grades2014/00-ching-slides.pdf
> Design doc:
> https://docs.google.com/document/d/1QULalzPqMVICi8jRVs3S0n39pell2ZVc7RNemz_SGA4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3694) Document stoppable function interface for custom sources

2016-04-04 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3694:


 Summary: Document stoppable function interface for custom sources
 Key: FLINK-3694
 URL: https://issues.apache.org/jira/browse/FLINK-3694
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Affects Versions: 1.0.0, 1.1.0
Reporter: Till Rohrmann
Priority: Minor


With Flink 1.0 we have introduced the {{StoppableFunction}} which allows to 
define stoppable sources for the {{DataStream}} API. This feature is not 
documented yet. I think we should add a section about custom sources which also 
explains the usage of the {{StoppableFunction}} interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.

2016-04-04 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-1502:
---

Assignee: Chesnay Schepler

> Expose metrics to graphite, ganglia and JMX.
> 
>
> Key: FLINK-1502
> URL: https://issues.apache.org/jira/browse/FLINK-1502
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, TaskManager
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other 
> systems such as graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3443) JobManager cancel and clear everything fails jobs instead of cancelling

2016-04-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3443.
--
Resolution: Won't Fix

> JobManager cancel and clear everything fails jobs instead of cancelling
> ---
>
> Key: FLINK-3443
> URL: https://issues.apache.org/jira/browse/FLINK-3443
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> When the job manager is shut down, it calls {{cancelAndClearEverything}}. 
> This method does not {{cancel}} the {{ExecutionGraph}} instances, but 
> {{fail}}s them, which can lead to {{ExecutionGraph}} restart.
> I've noticed this in tests, where old graph got into a loop of restarts.
> What I don't understand is why the futures etc. are not cancelled when the 
> executor service is shut down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3679) DeserializationSchema should handle zero or more outputs for every input

2016-04-04 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224182#comment-15224182
 ] 

Robert Metzger commented on FLINK-3679:
---

I had a quick offline chat about this with [~StephanEwen]. Changing the 
semantics of the DeserializationSchema to use an OutputCollector would be 
possible, but it would break existing code, introduce a new class and make the 
locking / operator chaining of the Kafka consumer code more complicated.
I wonder if the problems you've mentioned can't be solved with a flatMap() 
operator. When the Kafka consumer and the flatMap() are executed with the same 
parallelism, they'll be chained together and then executed in the same thread 
with almost no overhead.
If one Kafka message results in two or more logical messages, that "splitting" 
can be done in the flatMap() as well. For invalid records, this can also be 
reflected in the returned record (with a failure flag (some id set to -1 or a 
bool set to false), or a special field in a JSON record), ...) and then treated 
accordingly in the flatMap() call.

If you want, we can keep the JIRA issue open and see if more users run into 
this. If so, we can reconsider fixing it (I'm not saying I've decided against 
fixing it)

> DeserializationSchema should handle zero or more outputs for every input
> 
>
> Key: FLINK-3679
> URL: https://issues.apache.org/jira/browse/FLINK-3679
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Reporter: Jamie Grier
>
> There are a couple of issues with the DeserializationSchema API that I think 
> should be improved.  This request has come to me via an existing Flink user.
> The main issue is simply that the API assumes that there is a one-to-one 
> mapping between input and outputs.  In reality there are scenarios where one 
> input message (say from Kafka) might actually map to zero or more logical 
> elements in the pipeline.
> Particularly important here is the case where you receive a message from a 
> source (such as Kafka) and say the raw bytes don't deserialize properly.  
> Right now the only recourse is to throw IOException and therefore fail the 
> job.  
> This is definitely not good since bad data is a reality and failing the job 
> is not the right option.  If the job fails we'll just end up replaying the 
> bad data and the whole thing will start again.
> Instead in this case it would be best if the user could just return the empty 
> set.
> The other case is where one input message should logically be multiple output 
> messages.  This case is probably less important since there are other ways to 
> do this but in general it might be good to make the 
> DeserializationSchema.deserialize() method return a collection rather than a 
> single element.
> Maybe we need to support a DeserializationSchema variant that has semantics 
> more like that of FlatMap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224184#comment-15224184
 ] 

ASF GitHub Bot commented on FLINK-2609:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205307922
  
Let's not merge this as is, it uses again a static de-duplicator, like the 
DataSet API used to (we recently eliminated that). These static utils to work 
around putting adding variable in the right place is an antipattern.

Why not have the registration de-duplicator in the ExecutionEnvironment? 
Since the ExecutionCondig is also one-per environment, and is always accessed 
through the environment, that seems like the way to go...



> Automatic type registration is only called from the batch execution 
> environment
> ---
>
> Key: FLINK-2609
> URL: https://issues.apache.org/jira/browse/FLINK-2609
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> Kryo types in the streaming API are quite expensive to serialize because they 
> are not automatically registered at Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2119) Add ExecutionGraph support for leg-wise scheduling

2016-04-04 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2119:
---
Assignee: (was: Ufuk Celebi)

> Add ExecutionGraph support for leg-wise scheduling
> --
>
> Key: FLINK-2119
> URL: https://issues.apache.org/jira/browse/FLINK-2119
> Project: Flink
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 0.10.0
>Reporter: Ufuk Celebi
>
> Scheduling currently happens by lazily unrolling the ExecutionGraph from the 
> sources.
> 1. All sources are scheduled for execution.
> 2. Their results trigger scheduling and deployment of the receiving tasks 
> (either on the first available buffer or when all are produced [pipelined vs. 
> blocking exchange]).
> For certain batch jobs this can be problematic as many tasks will be running 
> at the same time and consume task manager resources like executionslots and 
> memory. For these jobs, it is desirable to schedule the ExecutionGraph in 
> with different strategies.
> With respect to the ExecutionGraph, the current limitation is that data 
> availability for a result always triggers scheduling of the consuming tasks. 
> This needs to be more general to allow different scheduling strategies.
> Consider the following example:
> {code}
>   [ union ]
>  / \
> /   \
>   [ source 1 ]  [ source 2 ]
> {code}
> Currently, both sources are scheduled concurrently and the "faster" one 
> triggers scheduling of the union. It is desirable to first allow source 1 to 
> completly produce its result, then trigger scheduling of source 2, and only 
> then schedule the union.
> The required changes in the ExecutionGraph are conceptually straight-forward: 
> instead of going through the list of result consumers and scheduling them, we 
> need to be able to run a more general action. For normal operation, this will 
> still schedule the consumer task, but we can also configure it to kick of the 
> next source etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types

2016-04-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-205307922
  
Let's not merge this as is, it uses again a static de-duplicator, like the 
DataSet API used to (we recently eliminated that). These static utils to work 
around putting adding variable in the right place is an antipattern.

Why not have the registration de-duplicator in the ExecutionEnvironment? 
Since the ExecutionCondig is also one-per environment, and is always accessed 
through the environment, that seems like the way to go...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3571) SavepointITCase.testRestoreFailure fails on Travis

2016-04-04 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224180#comment-15224180
 ] 

Ufuk Celebi commented on FLINK-3571:


It's impossible to fully debug this w/o logs :( Added better output in 21480e2.

> SavepointITCase.testRestoreFailure fails on Travis
> --
>
> Key: FLINK-3571
> URL: https://issues.apache.org/jira/browse/FLINK-3571
> Project: Flink
>  Issue Type: Bug
>Reporter: Till Rohrmann
>Assignee: Ufuk Celebi
>Priority: Critical
>  Labels: test-stability
>
> The test case {{SavepointITCase.testRestoreFailure}} failed on Travis [1]
> [1] https://s3.amazonaws.com/archive.travis-ci.org/jobs/113084521/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3492) Allow users to define a min pause between checkpoints

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224178#comment-15224178
 ] 

ASF GitHub Bot commented on FLINK-3492:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1794#issuecomment-205306447
  
How about adjusting the test such that it works the following way:
  - it polls the number of checkpoint calls (from the atomic integer), say 
every 100 ms
  - it starts measuring time from when the count goes to `1`
  - it validates that `(numCalls - 2) * pause` is always smaller than the 
elapsed time since starting the time measurement


> Allow users to define a min pause between checkpoints
> -
>
> Key: FLINK-3492
> URL: https://issues.apache.org/jira/browse/FLINK-3492
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> FLINK-3051 introduced already a filed in the {{CheckpointConfig}} to specify 
> a min pause between checkpoints.
> In high-load situations (big state), jobs might spend their entire time 
> creating snapshots, not processing data. With a min pause between 
> checkpoints, users can guarantee that there is a certain time-span the system 
> can use for doing some actual data processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3492] Configurable interval between Che...

2016-04-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1794#issuecomment-205306447
  
How about adjusting the test such that it works the following way:
  - it polls the number of checkpoint calls (from the atomic integer), say 
every 100 ms
  - it starts measuring time from when the count goes to `1`
  - it validates that `(numCalls - 2) * pause` is always smaller than the 
elapsed time since starting the time measurement


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3492) Allow users to define a min pause between checkpoints

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224158#comment-15224158
 ] 

ASF GitHub Bot commented on FLINK-3492:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1794#issuecomment-205303755
  
The code looks good, but the tests are bound to be unstable. Making the 
assumption that sleeps actually sleep as long as they should is often violated 
(seems especially on Travis).

I think we need to find a better way to test this...


> Allow users to define a min pause between checkpoints
> -
>
> Key: FLINK-3492
> URL: https://issues.apache.org/jira/browse/FLINK-3492
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> FLINK-3051 introduced already a filed in the {{CheckpointConfig}} to specify 
> a min pause between checkpoints.
> In high-load situations (big state), jobs might spend their entire time 
> creating snapshots, not processing data. With a min pause between 
> checkpoints, users can guarantee that there is a certain time-span the system 
> can use for doing some actual data processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3492] Configurable interval between Che...

2016-04-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1794#issuecomment-205303755
  
The code looks good, but the tests are bound to be unstable. Making the 
assumption that sleeps actually sleep as long as they should is often violated 
(seems especially on Travis).

I think we need to find a better way to test this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package

2016-04-04 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3524.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/c7595840

> Provide a JSONDeserialisationSchema in the kafka connector package
> --
>
> Key: FLINK-3524
> URL: https://issues.apache.org/jira/browse/FLINK-3524
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>  Labels: starter
> Fix For: 1.1.0
>
>
> (I don't want to include this into 1.0.0)
> Currently, there is no standardized way of parsing JSON data from a Kafka 
> stream. I see a lot of users using JSON in their topics. It would make things 
> easier for our users to provide a serializer for them.
> I suggest to use the jackson library because we have that aready as a 
> dependency in Flink and it allows to parse from a byte[].
> I would suggest to provide the following classes:
>  - JSONDeserializationSchema()
>  - JSONDeKeyValueSerializationSchema(bool includeMetadata)
> The second variant should produce a record like this:
> {code}
> {"key": "keydata",
> "value": "valuedata",
> "metadata": {"offset": 123, "topic": "", "partition": 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3524] [kafka] Add JSONDeserializationSc...

2016-04-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1834


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224155#comment-15224155
 ] 

ASF GitHub Bot commented on FLINK-3524:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1834


> Provide a JSONDeserialisationSchema in the kafka connector package
> --
>
> Key: FLINK-3524
> URL: https://issues.apache.org/jira/browse/FLINK-3524
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>  Labels: starter
> Fix For: 1.1.0
>
>
> (I don't want to include this into 1.0.0)
> Currently, there is no standardized way of parsing JSON data from a Kafka 
> stream. I see a lot of users using JSON in their topics. It would make things 
> easier for our users to provide a serializer for them.
> I suggest to use the jackson library because we have that aready as a 
> dependency in Flink and it allows to parse from a byte[].
> I would suggest to provide the following classes:
>  - JSONDeserializationSchema()
>  - JSONDeKeyValueSerializationSchema(bool includeMetadata)
> The second variant should produce a record like this:
> {code}
> {"key": "keydata",
> "value": "valuedata",
> "metadata": {"offset": 123, "topic": "", "partition": 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224149#comment-15224149
 ] 

ASF GitHub Bot commented on FLINK-3524:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1834#issuecomment-205303136
  
Merging ...


> Provide a JSONDeserialisationSchema in the kafka connector package
> --
>
> Key: FLINK-3524
> URL: https://issues.apache.org/jira/browse/FLINK-3524
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>  Labels: starter
>
> (I don't want to include this into 1.0.0)
> Currently, there is no standardized way of parsing JSON data from a Kafka 
> stream. I see a lot of users using JSON in their topics. It would make things 
> easier for our users to provide a serializer for them.
> I suggest to use the jackson library because we have that aready as a 
> dependency in Flink and it allows to parse from a byte[].
> I would suggest to provide the following classes:
>  - JSONDeserializationSchema()
>  - JSONDeKeyValueSerializationSchema(bool includeMetadata)
> The second variant should produce a record like this:
> {code}
> {"key": "keydata",
> "value": "valuedata",
> "metadata": {"offset": 123, "topic": "", "partition": 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3524] [kafka] Add JSONDeserializationSc...

2016-04-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1834#issuecomment-205303136
  
Merging ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3174) Add merging WindowAssigner

2016-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224143#comment-15224143
 ] 

ASF GitHub Bot commented on FLINK-3174:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1802#issuecomment-205302445
  
I did the changes, I introduced `canMerge()` and added a default 
`onMerge()` that throws a `RuntimeExeption` in `Trigger`. This way we don't 
break the API for already existing user triggers.

If there are no objections I will also merge the removal of the non-keyed 
window operator that this PR is based on.


> Add merging WindowAssigner
> --
>
> Key: FLINK-3174
> URL: https://issues.apache.org/jira/browse/FLINK-3174
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
> Fix For: 1.0.0
>
>
> We should add the possibility for WindowAssigners to merge windows. This will 
> enable Session windowing support, similar to how Google Cloud Dataflow 
> supports.
> For session windows, each element would initially be assigned to its own 
> window. When triggering we check the windows and see if any can be merged. 
> This way, elements with overlapping session windows merge into one session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3174] Add MergingWindowAssigner and Ses...

2016-04-04 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1802#issuecomment-205302445
  
I did the changes, I introduced `canMerge()` and added a default 
`onMerge()` that throws a `RuntimeExeption` in `Trigger`. This way we don't 
break the API for already existing user triggers.

If there are no objections I will also merge the removal of the non-keyed 
window operator that this PR is based on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >