[jira] [Created] (FLINK-4438) FlinkML Quickstart Guide implies incorrect type for test data

2016-08-21 Thread Ahmad Ragab (JIRA)
Ahmad Ragab created FLINK-4438:
--

 Summary: FlinkML Quickstart Guide implies incorrect type for test 
data
 Key: FLINK-4438
 URL: https://issues.apache.org/jira/browse/FLINK-4438
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.2.0
Reporter: Ahmad Ragab
Priority: Minor
 Fix For: 1.2.0


https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/ml/quickstart.html

Documentation under *LibSVM* section says that:

We can simply import the dataset then using:

{code:java}
import org.apache.flink.ml.MLUtils

val astroTrain: DataSet[LabeledVector] = 
MLUtils.readLibSVM("/path/to/svmguide1")
val astroTest: DataSet[LabeledVector] = 
MLUtils.readLibSVM("/path/to/svmguide1.t")
{code}

This gives us two {{DataSet\[LabeledVector\]}} objects that we will use in the 
following section to create a classifier.

Test data wouldn't be of type {{LabeledVector}} generally, it would be as it is 
described in other examples as {{DataSet\[Vector\]}} since prediction should 
generate the labels. Thus after reading the file using {{MLUtils}} it should be 
mapped to a vector.

Also, the previous section in *Loading Data* should include an example of using 
the {{Splitter}} in order to prepare the {{survivalLV}} data for use with a 
learner. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4438) FlinkML Quickstart Guide implies incorrect type for test data

2016-08-21 Thread Ahmad Ragab (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429637#comment-15429637
 ] 

Ahmad Ragab commented on FLINK-4438:


Perhaps I've opened this issue potentially incorrectly, since rereading the 
documentation there is some expected behavior which suggests that it is 
possible to run prediction on a {{DataSet\[LabeledVector\]}} however, when I 
run the example I get:  

{code:java}
java.lang.RuntimeException: There is no PredictOperation defined for 
org.apache.flink.ml.classification.SVM which takes a 
DataSet[org.apache.flink.ml.common.LabeledVector] as input.
java.lang.RuntimeException: There is no PredictOperation defined for 
org.apache.flink.ml.classification.SVM which takes a 
DataSet[org.apache.flink.ml.common.LabeledVector] as input.
at 
org.apache.flink.ml.pipeline.Estimator$$anon$1.predictDataSet(Estimator.scala:113)
at 
org.apache.flink.ml.pipeline.Predictor$class.predict(Predictor.scala:59)
at org.apache.flink.ml.classification.SVM.predict(SVM.scala:133)
at 
org.apache.flink.ml.pipeline.ChainedPredictor$$anon$1.predictDataSet(ChainedPredictor.scala:78)
at 
org.apache.flink.ml.pipeline.ChainedPredictor$$anon$1.predictDataSet(ChainedPredictor.scala:70)
at 
org.apache.flink.ml.pipeline.Predictor$class.predict(Predictor.scala:59)
at 
org.apache.flink.ml.pipeline.ChainedPredictor.predict(ChainedPredictor.scala:39)
at 
org.appdev12Hart.Job$.delayedEndpoint$org$appdev12Hart$Job$1(Job.scala:45)
at org.appdev12Hart.Job$delayedInit$body.apply(Job.scala:13)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at 
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at org.appdev12Hart.Job$.main(Job.scala:13)
at org.appdev12Hart.Job.main(Job.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{code}

> FlinkML Quickstart Guide implies incorrect type for test data
> -
>
> Key: FLINK-4438
> URL: https://issues.apache.org/jira/browse/FLINK-4438
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Ahmad Ragab
>Priority: Minor
> Fix For: 1.2.0
>
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/ml/quickstart.html
> Documentation under *LibSVM* section says that:
> 
> We can simply import the dataset then using:
> {code:java}
> import org.apache.flink.ml.MLUtils
> val astroTrain: DataSet[LabeledVector] = 
> MLUtils.readLibSVM("/path/to/svmguide1")
> val astroTest: DataSet[LabeledVector] = 
> MLUtils.readLibSVM("/path/to/svmguide1.t")
> {code}
> This gives us two {{DataSet\[LabeledVector\]}} objects that we will use in 
> the following section to create a classifier.
> 
> Test data wouldn't be of type {{LabeledVector}} generally, it would be as it 
> is described in other examples as {{DataSet\[Vector\]}} since prediction 
> should generate the labels. Thus after reading the file using {{MLUtils}} it 
> should be mapped to a vector.
> Also, the previous section in *Loading Data* should include an example of 
> using the {{Splitter}} in order to prepare the {{survivalLV}} data for use 
> with a learner. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2396: [FLINK-4395][cep] Eager processing of late arrival...

2016-08-21 Thread mushketyk
GitHub user mushketyk opened a pull request:

https://github.com/apache/flink/pull/2396

[FLINK-4395][cep] Eager processing of late arrivals in CEP operator

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mushketyk/flink eager-cep

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2396


commit a2a150dde32c1fc54f8f7fa2e124c40b764c016f
Author: Ivan Mushketyk 
Date:   2016-08-20T22:44:10Z

[FLINK-4395][cep] Replace isProcessingTime field with more expressive 
ProcessingType enum

commit 79f1754655869bd18440260e7f52c4769fc690fe
Author: Ivan Mushketyk 
Date:   2016-08-20T23:48:48Z

[FLINK-4395][cep] Eager processing of late arrivals in CEP operator

commit 8ca5b16919bec76bb936b3d8f33018159306bc5b
Author: Ivan Mushketyk 
Date:   2016-08-21T00:12:15Z

[FLINK-4395][cep] Minor refactoring




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4395) Eager processing of late arrivals in CEP operator

2016-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429641#comment-15429641
 ] 

ASF GitHub Bot commented on FLINK-4395:
---

GitHub user mushketyk opened a pull request:

https://github.com/apache/flink/pull/2396

[FLINK-4395][cep] Eager processing of late arrivals in CEP operator

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mushketyk/flink eager-cep

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2396


commit a2a150dde32c1fc54f8f7fa2e124c40b764c016f
Author: Ivan Mushketyk 
Date:   2016-08-20T22:44:10Z

[FLINK-4395][cep] Replace isProcessingTime field with more expressive 
ProcessingType enum

commit 79f1754655869bd18440260e7f52c4769fc690fe
Author: Ivan Mushketyk 
Date:   2016-08-20T23:48:48Z

[FLINK-4395][cep] Eager processing of late arrivals in CEP operator

commit 8ca5b16919bec76bb936b3d8f33018159306bc5b
Author: Ivan Mushketyk 
Date:   2016-08-21T00:12:15Z

[FLINK-4395][cep] Minor refactoring




> Eager processing of late arrivals in CEP operator
> -
>
> Key: FLINK-4395
> URL: https://issues.apache.org/jira/browse/FLINK-4395
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Reporter: Till Rohrmann
>Assignee: Ivan Mushketyk
>Priority: Minor
>
> At the moment elements are only processed after the CEP operator has received 
> a watermark larger than the elements (in EventTime mode). In case of late 
> arrivals this means that the late elements are not processed until the next 
> watermark has arrived.
> In order to decrease the latency for this scenario, I propose to eagerly 
> process late arrivals in the CEP operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2254) Add Bipartite Graph Support for Gelly

2016-08-21 Thread Ivan Mushketyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mushketyk reassigned FLINK-2254:
-

Assignee: Ivan Mushketyk

> Add Bipartite Graph Support for Gelly
> -
>
> Key: FLINK-2254
> URL: https://issues.apache.org/jira/browse/FLINK-2254
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>Assignee: Ivan Mushketyk
>  Labels: requires-design-doc
>
> A bipartite graph is a graph for which the set of vertices can be divided 
> into two disjoint sets such that each edge having a source vertex in the 
> first set, will have a target vertex in the second set. We would like to 
> support efficient operations for this type of graphs along with a set of 
> metrics(http://jponnela.com/web_documents/twomode.pdf). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4439) Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid

2016-08-21 Thread Gheorghe Gheorghe (JIRA)
Gheorghe Gheorghe created FLINK-4439:


 Summary: Error message KafkaConsumer08 when all 
'bootstrap.servers' are invalid
 Key: FLINK-4439
 URL: https://issues.apache.org/jira/browse/FLINK-4439
 Project: Flink
  Issue Type: Improvement
  Components: Streaming Connectors
Affects Versions: 1.0.3
Reporter: Gheorghe Gheorghe
Priority: Minor


The "flink-connector-kafka-0.8_2"  is logging the following error when all 
'bootstrap.servers' are invalid when passed to the FlinkKafkaConsumer08. 

See stacktrace: 
{code:title=stacktrace|borderStyle=solid}
2016-08-21 15:22:30 WARN  FlinkKafkaConsumerBase:290 - Error communicating with 
broker inexistentKafkHost:9092 to find partitions for [testTopic].class 
java.nio.channels.ClosedChannelException. Message: null
2016-08-21 15:22:30 DEBUG FlinkKafkaConsumerBase:292 - Detailed trace
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78)
at 
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:264)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:193)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:164)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:131)
at MetricsFromKafka$.main(MetricsFromKafka.scala:38)
at MetricsFromKafka.main(MetricsFromKafka.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sbt.Run.invokeMain(Run.scala:67)
at sbt.Run.run0(Run.scala:61)
at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
at sbt.Logger$$anon$4.apply(Logger.scala:84)
at sbt.TrapExit$App.run(TrapExit.scala:248)
at java.lang.Thread.run(Thread.java:745)
{code}
In the above stackrace it is hard to figure out that the actual servers 
provided as a config cannot be resolved to a valid ip address. Moreover the 
flink kafka consumer will try all of those servers one by one and failing to 
get partition information.

The suggested improvement is to fail fast and announce the user that the 
servers provided in the 'boostrap.servers' config are invalid. If at least one 
server is valid then the exception should not be thrown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4439) Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid

2016-08-21 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429735#comment-15429735
 ] 

Robert Metzger commented on FLINK-4439:
---

In my opinion, the logging is pretty good.
There's a log message at WARN level:
{code}
2016-08-21 15:22:30 WARN  FlinkKafkaConsumerBase:290 - Error communicating with 
broker inexistentKafkHost:9092 to find partitions for [testTopic].class 
java.nio.channels.ClosedChannelException. Message: null
{code}

and the stack trace is at debug level.

I'm not sure if failing fast is a good solution: Maybe its just a temporary 
issue with the broker, or the client can not contact the broker.

> Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid
> --
>
> Key: FLINK-4439
> URL: https://issues.apache.org/jira/browse/FLINK-4439
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> The "flink-connector-kafka-0.8_2"  is logging the following error when all 
> 'bootstrap.servers' are invalid when passed to the FlinkKafkaConsumer08. 
> See stacktrace: 
> {code:title=stacktrace|borderStyle=solid}
> 2016-08-21 15:22:30 WARN  FlinkKafkaConsumerBase:290 - Error communicating 
> with broker inexistentKafkHost:9092 to find partitions for [testTopic].class 
> java.nio.channels.ClosedChannelException. Message: null
> 2016-08-21 15:22:30 DEBUG FlinkKafkaConsumerBase:292 - Detailed trace
> java.nio.channels.ClosedChannelException
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:264)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:193)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:164)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:131)
>   at MetricsFromKafka$.main(MetricsFromKafka.scala:38)
>   at MetricsFromKafka.main(MetricsFromKafka.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at sbt.Run.invokeMain(Run.scala:67)
>   at sbt.Run.run0(Run.scala:61)
>   at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
>   at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Logger$$anon$4.apply(Logger.scala:84)
>   at sbt.TrapExit$App.run(TrapExit.scala:248)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> In the above stackrace it is hard to figure out that the actual servers 
> provided as a config cannot be resolved to a valid ip address. Moreover the 
> flink kafka consumer will try all of those servers one by one and failing to 
> get partition information.
> The suggested improvement is to fail fast and announce the user that the 
> servers provided in the 'boostrap.servers' config are invalid. If at least 
> one server is valid then the exception should not be thrown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2397: [FLINK-4439] Validate 'bootstrap.servers' config i...

2016-08-21 Thread gheo21
GitHub user gheo21 opened a pull request:

https://github.com/apache/flink/pull/2397

[FLINK-4439] Validate 'bootstrap.servers' config in flink kafka consu…

Hello everybody, 

I would like to contribute a small improvement to Flink. 
Lately, I was using the FlinkKafkaConsumer08 to write a streaming topology 
in flink. Somehow I mistakenly configured the 'boostrap.servers' for the kafka 
config with invalid hosts. 

The message that flink provided was not clearly stating what the problem 
was. Hence, my improvement consists of a validation of the servers provided in 
'boostrap.servers'. 

If none of the configured servers are valid then we should fail-fast and a 
validation exception should be thrown. If at lease one server is valid then we 
don't throw any exception.

See for more info: https://issues.apache.org/jira/browse/FLINK-4439

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gheo21/flink 
flink-4439-kafka-consumer-conf-validation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2397


commit 81bfe72f0de2e21941143cd92c3d031962e6bdc7
Author: George 
Date:   2016-08-21T14:08:02Z

[FLINK-4439] Validate 'bootstrap.servers' config in flink kafka consumer 0.8




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4439) Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid

2016-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429744#comment-15429744
 ] 

ASF GitHub Bot commented on FLINK-4439:
---

GitHub user gheo21 opened a pull request:

https://github.com/apache/flink/pull/2397

[FLINK-4439] Validate 'bootstrap.servers' config in flink kafka consu…

Hello everybody, 

I would like to contribute a small improvement to Flink. 
Lately, I was using the FlinkKafkaConsumer08 to write a streaming topology 
in flink. Somehow I mistakenly configured the 'boostrap.servers' for the kafka 
config with invalid hosts. 

The message that flink provided was not clearly stating what the problem 
was. Hence, my improvement consists of a validation of the servers provided in 
'boostrap.servers'. 

If none of the configured servers are valid then we should fail-fast and a 
validation exception should be thrown. If at lease one server is valid then we 
don't throw any exception.

See for more info: https://issues.apache.org/jira/browse/FLINK-4439

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gheo21/flink 
flink-4439-kafka-consumer-conf-validation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2397


commit 81bfe72f0de2e21941143cd92c3d031962e6bdc7
Author: George 
Date:   2016-08-21T14:08:02Z

[FLINK-4439] Validate 'bootstrap.servers' config in flink kafka consumer 0.8




> Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid
> --
>
> Key: FLINK-4439
> URL: https://issues.apache.org/jira/browse/FLINK-4439
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> The "flink-connector-kafka-0.8_2"  is logging the following error when all 
> 'bootstrap.servers' are invalid when passed to the FlinkKafkaConsumer08. 
> See stacktrace: 
> {code:title=stacktrace|borderStyle=solid}
> 2016-08-21 15:22:30 WARN  FlinkKafkaConsumerBase:290 - Error communicating 
> with broker inexistentKafkHost:9092 to find partitions for [testTopic].class 
> java.nio.channels.ClosedChannelException. Message: null
> 2016-08-21 15:22:30 DEBUG FlinkKafkaConsumerBase:292 - Detailed trace
> java.nio.channels.ClosedChannelException
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:264)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:193)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:164)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:131)
>   at MetricsFromKafka$.main(MetricsFromKafka.scala:38)
>   at MetricsFromKafka.main(MetricsFromKafka.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at sbt.Run.invokeMain(Run.scala:67)
>   at sbt.Run.run0(Run.scala:61)
>   at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
>   at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Logger$$anon$4.apply(Logger.scala:84)
>   at sbt.TrapExit$App.run(TrapExit.scala:248)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> In the above stackrace it is hard to figure out that the actual servers 
> provided as a config cannot be resolved to a valid ip address. Moreover the 
> flink kafka consumer will try all of those servers one by one and failing to 
> get partition information.
> The suggested improvement is to fail fast and announce the user that the 
> servers provided in the 'boostrap.servers' config are invalid. If at least 
> one serv

[GitHub] flink pull request #2398: [gelly] Add Vertex.create and Edge.create helper m...

2016-08-21 Thread mushketyk
GitHub user mushketyk opened a pull request:

https://github.com/apache/flink/pull/2398

[gelly] Add Vertex.create and Edge.create helper methods

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mushketyk/flink better-vertexes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2398


commit e68dcbb11d4616707c7781f603498078b26bb460
Author: Ivan Mushketyk 
Date:   2016-08-21T13:17:24Z

[gelly] Add Vertex.create and Edge.create helper methods




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2398: [gelly] Add Vertex.create and Edge.create helper methods

2016-08-21 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2398
  
Added helper methods to make Gelly API more concise.
Now one can write this:
```
Vertex v = Vertex.create(42);
Edge e = Edge.create(5, 6);
```

Instead of this:
```
Vertex v = new Vertex(42, 
NullValue.getInstance());
Edge e = new Edge(5, 6, NullValue.getInstance());
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4439) Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid

2016-08-21 Thread Gheorghe Gheorghe (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429761#comment-15429761
 ] 

Gheorghe Gheorghe commented on FLINK-4439:
--

[~rmetzger] can you please take a look? thx!

> Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid
> --
>
> Key: FLINK-4439
> URL: https://issues.apache.org/jira/browse/FLINK-4439
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> The "flink-connector-kafka-0.8_2"  is logging the following error when all 
> 'bootstrap.servers' are invalid when passed to the FlinkKafkaConsumer08. 
> See stacktrace: 
> {code:title=stacktrace|borderStyle=solid}
> 2016-08-21 15:22:30 WARN  FlinkKafkaConsumerBase:290 - Error communicating 
> with broker inexistentKafkHost:9092 to find partitions for [testTopic].class 
> java.nio.channels.ClosedChannelException. Message: null
> 2016-08-21 15:22:30 DEBUG FlinkKafkaConsumerBase:292 - Detailed trace
> java.nio.channels.ClosedChannelException
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:264)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:193)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:164)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:131)
>   at MetricsFromKafka$.main(MetricsFromKafka.scala:38)
>   at MetricsFromKafka.main(MetricsFromKafka.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at sbt.Run.invokeMain(Run.scala:67)
>   at sbt.Run.run0(Run.scala:61)
>   at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
>   at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Logger$$anon$4.apply(Logger.scala:84)
>   at sbt.TrapExit$App.run(TrapExit.scala:248)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> In the above stackrace it is hard to figure out that the actual servers 
> provided as a config cannot be resolved to a valid ip address. Moreover the 
> flink kafka consumer will try all of those servers one by one and failing to 
> get partition information.
> The suggested improvement is to fail fast and announce the user that the 
> servers provided in the 'boostrap.servers' config are invalid. If at least 
> one server is valid then the exception should not be thrown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3394) Clear up the contract of MutableObjectIterator.next(reuse)

2016-08-21 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-3394:
---
Component/s: (was: Distributed Coordination)
 Local Runtime

> Clear up the contract of MutableObjectIterator.next(reuse)
> --
>
> Key: FLINK-3394
> URL: https://issues.apache.org/jira/browse/FLINK-3394
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Priority: Critical
> Fix For: 1.0.0
>
>
> {{MutableObjectIterator.next(reuse)}} has the following contract (according 
> to [~StephanEwen]'s comment \[1\]):
> 1. The caller may not hold onto {{reuse}} any more
> 2. The iterator implementor may not hold onto the returned object any more.
> This should be documented in its javadoc (with "WARNING" so that people don't 
> overlook it).
> Additionally, since this was a "secret contract" up to now, all the 270 
> usages of {{MutableObjectIterator.next(reuse)}} should be checked for 
> violations. A few that are suspicious at first glance, are in 
> {{CrossDriver}}, {{UnionWithTempOperator}}, 
> {{MutableHashTable.ProbeIterator.next}}, 
> {{ReusingBuildFirstHashJoinIterator.callWithNextKey}}. (The violating calls 
> in the reduce drivers are being fixed by 
> https://github.com/apache/flink/pull/1626 )
> \[1\] 
> https://issues.apache.org/jira/browse/FLINK-3291?focusedCommentId=15144654&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15144654



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs

2016-08-21 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-3322:
---
Component/s: (was: Distributed Coordination)
 Local Runtime

> MemoryManager creates too much GC pressure with iterative jobs
> --
>
> Key: FLINK-3322
> URL: https://issues.apache.org/jira/browse/FLINK-3322
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> When taskmanager.memory.preallocate is false (the default), released memory 
> segments are not added to a pool, but the GC is expected to take care of 
> them. This puts too much pressure on the GC with iterative jobs, where the 
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the 
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. 
> (It will generate some lookuptables to /tmp on first run for a few minutes.) 
> (I think the slowdown might also depend somewhat on 
> taskmanager.memory.fraction, because more unused non-managed memory results 
> in rarer GCs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs

2016-08-21 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-3322:
---
Assignee: (was: Gabor Horvath)

> MemoryManager creates too much GC pressure with iterative jobs
> --
>
> Key: FLINK-3322
> URL: https://issues.apache.org/jira/browse/FLINK-3322
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Priority: Critical
> Fix For: 1.0.0
>
>
> When taskmanager.memory.preallocate is false (the default), released memory 
> segments are not added to a pool, but the GC is expected to take care of 
> them. This puts too much pressure on the GC with iterative jobs, where the 
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the 
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. 
> (It will generate some lookuptables to /tmp on first run for a few minutes.) 
> (I think the slowdown might also depend somewhat on 
> taskmanager.memory.fraction, because more unused non-managed memory results 
> in rarer GCs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3291) Object reuse bug in MergeIterator.HeadStream.nextHead

2016-08-21 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay resolved FLINK-3291.

Resolution: Fixed

> Object reuse bug in MergeIterator.HeadStream.nextHead
> -
>
> Key: FLINK-3291
> URL: https://issues.apache.org/jira/browse/FLINK-3291
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Critical
>
> MergeIterator.HeadStream.nextHead saves a reference into `this.head` of the 
> `reuse` object that it got as an argument. This object might be modified 
> later by the caller.
> This actually happens when ReduceDriver.run calls input.next (which will 
> actually be MergeIterator.next(E reuse)) in the inner while loop of the 
> objectReuseEnabled branch, and that calls top.nextHead with the reference 
> that it got from ReduceDriver, which erroneously saves the reference, and 
> then ReduceDriver later uses that same object for doing the reduce.
> Another way in which this fails is when MergeIterator.next(E reuse) gives 
> `reuse` to different `top`s in different calls, and then the heads end up 
> being the same object.
> You can observe the latter situation in action by running ReducePerformance 
> here:
> https://github.com/ggevay/flink/tree/merge-iterator-object-reuse-bug
> Set memory to -Xmx200m (so that the MergeIterator actually has merging to 
> do), put a breakpoint at the beginning of MergeIterator.next(reuse), and then 
> watch `reuse`, and the heads of the first two elements of `this.heap` in the 
> debugger. They will get to be the same object after hitting continue about 6 
> times.
> You can also look at the count that is printed at the end, which shouldn't be 
> larger than the key range. Also, if you look into the output file 
> /tmp/xxxobjectreusebug, for example the key 77 appears twice.
> The good news is that I think I can see an easy fix that doesn't affect 
> performance: MergeIterator.HeadStream could have a reuse object of its own as 
> a member, and give that to iterator.next in nextHead(E reuse). And then we 
> wouldn't need the overload of nextHead that has the reuse parameter, and 
> MergeIterator.next(E reuse) could just call its other overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2398: [gelly] Add Vertex.create and Edge.create helper methods

2016-08-21 Thread vasia
Github user vasia commented on the issue:

https://github.com/apache/flink/pull/2398
  
Hi @mushketyk,
thank you for this PR. Is there a corresponding JIRA for it? If not, can 
you please create one and tag the PR with it?
Also, make sure you read the [How to contribute 
guide](https://flink.apache.org/how-to-contribute.html) and make sure your PR 
checks all the items in the list above.
Let us know if you have questions!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3722) The divisions in the InMemorySorters' swap/compare methods hurt performance

2016-08-21 Thread Gabor Gevay (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-3722:
---
Assignee: (was: Gabor Gevay)

> The divisions in the InMemorySorters' swap/compare methods hurt performance
> ---
>
> Key: FLINK-3722
> URL: https://issues.apache.org/jira/browse/FLINK-3722
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Reporter: Gabor Gevay
>Priority: Minor
>  Labels: performance
>
> NormalizedKeySorter's and FixedLengthRecordSorter's swap and compare methods 
> use divisions (which take a lot of time \[1\]) to calculate the index of the 
> MemorySegment and the offset inside the segment. [~greghogan] reported on the 
> mailing list \[2\] measuring a ~12-14% performance effect in one case.
> A possibility to improve the situation is the following:
> The way that QuickSort mostly uses these compare and swap methods is that it 
> maintains two indices, and uses them to call compare and swap. The key 
> observation is that these indices are mostly stepped by one, and 
> _incrementally_ calculating the quotient and modulo is actually easy when the 
> index changes only by one: increment/decrement the modulo, and check whether 
> the modulo has reached 0 or the divisor, and if it did, then wrap-around the 
> modulo and increment/decrement the quotient.
> To implement this, InMemorySorter would have to expose an iterator that would 
> have the divisor and the current modulo and quotient as state, and have 
> increment/decrement methods. Compare and swap could then have overloads that 
> take these iterators as arguments.
> \[1\] http://www.agner.org/optimize/instruction_tables.pdf
> \[2\] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Macro-benchmarking-for-performance-tuning-and-regression-detection-td11078.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4440) Make API for edge/vertex creation less verbose

2016-08-21 Thread Ivan Mushketyk (JIRA)
Ivan Mushketyk created FLINK-4440:
-

 Summary: Make API for edge/vertex creation less verbose
 Key: FLINK-4440
 URL: https://issues.apache.org/jira/browse/FLINK-4440
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Reporter: Ivan Mushketyk
Assignee: Ivan Mushketyk
Priority: Trivial


It would be better if one could create vertex/edges like this:

{code:java}
Vertex v = Vertex.create(42);
Edge e = Edge.create(5, 6);
{code}

Instead of this:
{code:java}
Vertex v = new Vertex(42, 
NullValue.getInstance());
Edge e = new Edge(5, 
6, NullValue.getInstance());
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4440) Make API for edge/vertex creation less verbose

2016-08-21 Thread Ivan Mushketyk (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429793#comment-15429793
 ] 

Ivan Mushketyk commented on FLINK-4440:
---

Pull request: https://github.com/apache/flink/pull/2398

> Make API for edge/vertex creation less verbose
> --
>
> Key: FLINK-4440
> URL: https://issues.apache.org/jira/browse/FLINK-4440
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>Priority: Trivial
>
> It would be better if one could create vertex/edges like this:
> {code:java}
> Vertex v = Vertex.create(42);
> Edge e = Edge.create(5, 6);
> {code}
> Instead of this:
> {code:java}
> Vertex v = new Vertex(42, 
> NullValue.getInstance());
> Edge e = new Edge NullValue>(5, 6, NullValue.getInstance());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2398: [FLINK-4440][gelly] Add Vertex.create and Edge.create hel...

2016-08-21 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2398
  
Hi @vasia 
I've created a JIRA issue for the pull request.
What do you think about this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4440) Make API for edge/vertex creation less verbose

2016-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429794#comment-15429794
 ] 

ASF GitHub Bot commented on FLINK-4440:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2398
  
Hi @vasia 
I've created a JIRA issue for the pull request.
What do you think about this change?


> Make API for edge/vertex creation less verbose
> --
>
> Key: FLINK-4440
> URL: https://issues.apache.org/jira/browse/FLINK-4440
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>Priority: Trivial
>
> It would be better if one could create vertex/edges like this:
> {code:java}
> Vertex v = Vertex.create(42);
> Edge e = Edge.create(5, 6);
> {code}
> Instead of this:
> {code:java}
> Vertex v = new Vertex(42, 
> NullValue.getInstance());
> Edge e = new Edge NullValue>(5, 6, NullValue.getInstance());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4439) Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid

2016-08-21 Thread Gheorghe Gheorghe (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429852#comment-15429852
 ] 

Gheorghe Gheorghe commented on FLINK-4439:
--

Hi Robert, 

thanks for taking a look. 
I agree with you there is logging and it looks good, but it doesn't point to 
the fact that my DNS cannot resolve the host name that I configured in the 
'boostrap.servers'.

My suggestion is to improve on that message to tell explicitly that the user 
misconfigured the boostrap server list. Because how it is right now, you retry 
and retry but your still get 'ClosedChannelException. Message: null' without 
any clue why.

This exception will not fail fast if your broker is unreachable because of 
let's say the kafka broker service being stopped or being temporarily 
unavailable. It would only fail if none of the configured hosts are resolvable 
by DNS, which I think it's an important situation which should raised to the 
user.

The user would get immediate feedback that none of the configured brokers have 
a valid server configured. (if at least one is available nothing will be thrown)

What do you think?

> Error message KafkaConsumer08 when all 'bootstrap.servers' are invalid
> --
>
> Key: FLINK-4439
> URL: https://issues.apache.org/jira/browse/FLINK-4439
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> The "flink-connector-kafka-0.8_2"  is logging the following error when all 
> 'bootstrap.servers' are invalid when passed to the FlinkKafkaConsumer08. 
> See stacktrace: 
> {code:title=stacktrace|borderStyle=solid}
> 2016-08-21 15:22:30 WARN  FlinkKafkaConsumerBase:290 - Error communicating 
> with broker inexistentKafkHost:9092 to find partitions for [testTopic].class 
> java.nio.channels.ClosedChannelException. Message: null
> 2016-08-21 15:22:30 DEBUG FlinkKafkaConsumerBase:292 - Detailed trace
> java.nio.channels.ClosedChannelException
>   at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)
>   at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78)
>   at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68)
>   at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:91)
>   at kafka.javaapi.consumer.SimpleConsumer.send(SimpleConsumer.scala:68)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.getPartitionsForTopic(FlinkKafkaConsumer08.java:264)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:193)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:164)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.(FlinkKafkaConsumer08.java:131)
>   at MetricsFromKafka$.main(MetricsFromKafka.scala:38)
>   at MetricsFromKafka.main(MetricsFromKafka.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at sbt.Run.invokeMain(Run.scala:67)
>   at sbt.Run.run0(Run.scala:61)
>   at sbt.Run.sbt$Run$$execute$1(Run.scala:51)
>   at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Run$$anonfun$run$1.apply(Run.scala:55)
>   at sbt.Logger$$anon$4.apply(Logger.scala:84)
>   at sbt.TrapExit$App.run(TrapExit.scala:248)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> In the above stackrace it is hard to figure out that the actual servers 
> provided as a config cannot be resolved to a valid ip address. Moreover the 
> flink kafka consumer will try all of those servers one by one and failing to 
> get partition information.
> The suggested improvement is to fail fast and announce the user that the 
> servers provided in the 'boostrap.servers' config are invalid. If at least 
> one server is valid then the exception should not be thrown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4441) RocksDB statebackend makes all operators appear stateful

2016-08-21 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-4441:
-

 Summary: RocksDB statebackend makes all operators appear stateful
 Key: FLINK-4441
 URL: https://issues.apache.org/jira/browse/FLINK-4441
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing, Streaming
Affects Versions: 1.1.1
Reporter: Gyula Fora
Assignee: Gyula Fora
Priority: Blocker


When the state is empty the rocks db state backend returns an empty hashmap 
instead of a null in the snapshotPartitionedState method.

This means that these operators always appear stateful to the flink runtime 
which makes it impossible for instance to remove a stateless operator using the 
savepoints.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2399: [FLINK-4441] Make RocksDB backend return null on e...

2016-08-21 Thread gyfora
GitHub user gyfora opened a pull request:

https://github.com/apache/flink/pull/2399

[FLINK-4441] Make RocksDB backend return null on empty state

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gyfora/flink FLINK-4441

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2399.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2399


commit d984fcf0df5a2ee5ca020d5beaec5697c437e09e
Author: Gyula Fora 
Date:   2016-08-21T20:01:16Z

[FLINK-4441] Make RocksDB backend return null on empty state




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4441) RocksDB statebackend makes all operators appear stateful

2016-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429881#comment-15429881
 ] 

ASF GitHub Bot commented on FLINK-4441:
---

GitHub user gyfora opened a pull request:

https://github.com/apache/flink/pull/2399

[FLINK-4441] Make RocksDB backend return null on empty state

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gyfora/flink FLINK-4441

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2399.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2399


commit d984fcf0df5a2ee5ca020d5beaec5697c437e09e
Author: Gyula Fora 
Date:   2016-08-21T20:01:16Z

[FLINK-4441] Make RocksDB backend return null on empty state




> RocksDB statebackend makes all operators appear stateful
> 
>
> Key: FLINK-4441
> URL: https://issues.apache.org/jira/browse/FLINK-4441
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing, Streaming
>Affects Versions: 1.1.1
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Blocker
>
> When the state is empty the rocks db state backend returns an empty hashmap 
> instead of a null in the snapshotPartitionedState method.
> This means that these operators always appear stateful to the flink runtime 
> which makes it impossible for instance to remove a stateless operator using 
> the savepoints.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4427) Add slot / Implement container releasing logic (Standalone / Yarn / Mesos)

2016-08-21 Thread zhuhaifeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuhaifeng reassigned FLINK-4427:
-

Assignee: zhuhaifeng

> Add slot / Implement container releasing logic (Standalone / Yarn / Mesos)
> --
>
> Key: FLINK-4427
> URL: https://issues.apache.org/jira/browse/FLINK-4427
> Project: Flink
>  Issue Type: Sub-task
>  Components: Cluster Management
>Reporter: Kurt Young
>Assignee: zhuhaifeng
>
> Currently we only have allocation logic for SlotManager / ResourceManager, 
> for some batch job, slots that already finished can be released, thus should 
> trigger container release in different cluster modes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4430) prevent BLOCKING partition data lose when task is finished

2016-08-21 Thread zhuhaifeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuhaifeng reassigned FLINK-4430:
-

Assignee: zhuhaifeng

> prevent BLOCKING partition data lose when task is finished
> --
>
> Key: FLINK-4430
> URL: https://issues.apache.org/jira/browse/FLINK-4430
> Project: Flink
>  Issue Type: Sub-task
>  Components: Cluster Management
>Reporter: Kurt Young
>Assignee: zhuhaifeng
>
> When we have a BLOCKING result partition type, the data is actually holded by 
> TaskManager, not the slot. When we finish the produce task, we will mark this 
> task finished and try to release the slot. Under the new architecture, with 
> yarn or mesos mode, releasing slot may trigger releasing the container, so 
> the TaskManager will be terminated, thus the result data is lost. We should 
> introduce some mechanism to prevent this from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4440) Make API for edge/vertex creation less verbose

2016-08-21 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429953#comment-15429953
 ] 

Greg Hogan commented on FLINK-4440:
---

Flink supports a minimum of Java 7 (which is end-of-life) so the current usage 
with the diamond operator is {{Vertex v = new Vertex<>(42, 
NullValue.getInstance());}}. If we were to add these I think having a second 
constructor without the value parameter would be clearer than creating two new 
static methods.

I think the better long-term solution would be to have Vertex and Edge be 
interfaces or abstract classes to use POJOs instead of tuples.

> Make API for edge/vertex creation less verbose
> --
>
> Key: FLINK-4440
> URL: https://issues.apache.org/jira/browse/FLINK-4440
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Ivan Mushketyk
>Assignee: Ivan Mushketyk
>Priority: Trivial
>
> It would be better if one could create vertex/edges like this:
> {code:java}
> Vertex v = Vertex.create(42);
> Edge e = Edge.create(5, 6);
> {code}
> Instead of this:
> {code:java}
> Vertex v = new Vertex(42, 
> NullValue.getInstance());
> Edge e = new Edge NullValue>(5, 6, NullValue.getInstance());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2400: [FLINK-4363] Implement TaskManager basic startup o...

2016-08-21 Thread wangzhijiang999
GitHub user wangzhijiang999 opened a pull request:

https://github.com/apache/flink/pull/2400

[FLINK-4363] Implement TaskManager basic startup of all components in java

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink jira-4363

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2400.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2400


commit afc504f37ec873b3be30320c3bcbc93fcbe1aec3
Author: 淘江 
Date:   2016-08-19T09:32:10Z

implement taskmanager startup basic components in java [#FLINK-4363]

commit bba0f679d3b2f7b779c18b13fefd4b50edbdfb69
Author: 淘江 
Date:   2016-08-22T03:13:02Z

Merge branch 'flip-6' of https://github.com/apache/flink into 
flink-tm_startup

commit 22da4e24e155f309d2705bbb5a6d1a64744dc4e3
Author: 淘江 
Date:   2016-08-22T03:52:05Z

[FLINK-4363] Implement TaskManager basic startup of all components in java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4363) Implement TaskManager basic startup of all components in java

2016-08-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430011#comment-15430011
 ] 

ASF GitHub Bot commented on FLINK-4363:
---

GitHub user wangzhijiang999 opened a pull request:

https://github.com/apache/flink/pull/2400

[FLINK-4363] Implement TaskManager basic startup of all components in java

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink jira-4363

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2400.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2400


commit afc504f37ec873b3be30320c3bcbc93fcbe1aec3
Author: 淘江 
Date:   2016-08-19T09:32:10Z

implement taskmanager startup basic components in java [#FLINK-4363]

commit bba0f679d3b2f7b779c18b13fefd4b50edbdfb69
Author: 淘江 
Date:   2016-08-22T03:13:02Z

Merge branch 'flip-6' of https://github.com/apache/flink into 
flink-tm_startup

commit 22da4e24e155f309d2705bbb5a6d1a64744dc4e3
Author: 淘江 
Date:   2016-08-22T03:52:05Z

[FLINK-4363] Implement TaskManager basic startup of all components in java




> Implement TaskManager basic startup of all components in java
> -
>
> Key: FLINK-4363
> URL: https://issues.apache.org/jira/browse/FLINK-4363
> Project: Flink
>  Issue Type: Sub-task
>  Components: Cluster Management
>Reporter: Zhijiang Wang
>Assignee: Zhijiang Wang
>
> Similar with current {{TaskManager}},but implement initialization and startup 
> all components in java instead of scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4442) Implement Standalone ResourceManager

2016-08-21 Thread Kurt Young (JIRA)
Kurt Young created FLINK-4442:
-

 Summary: Implement Standalone ResourceManager 
 Key: FLINK-4442
 URL: https://issues.apache.org/jira/browse/FLINK-4442
 Project: Flink
  Issue Type: Sub-task
Reporter: Kurt Young


implement ResouceManager rusn in Standalone mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4443) Add support in RpcCompletenessTest for inheritance of RpcGateway and RpcEndpoint

2016-08-21 Thread Wenlong Lyu (JIRA)
Wenlong Lyu created FLINK-4443:
--

 Summary: Add support in RpcCompletenessTest for inheritance of 
RpcGateway and RpcEndpoint
 Key: FLINK-4443
 URL: https://issues.apache.org/jira/browse/FLINK-4443
 Project: Flink
  Issue Type: Sub-task
Reporter: Wenlong Lyu
Assignee: Wenlong Lyu


RpcCompletenessTest needs to support RpcGateway which is composited by some 
basic functions like the example following:
{code:java}
public interface ExecutionStateListener extends RpcGateway {
public void notifyExecutionStateChanges();
}
public interface JobStateListener extends RpcGateway {
public void notifyJobStateChanges();
}
public interface JobWatcher extends ExecutionStateListener, JobStateListener, 
RpcGateway {

}
public class JobWatcherEndpoint extends RpcEndpoint {
protected JobWatcherEndpoint(RpcService rpcService) {
super(rpcService);
}
@RpcMethod
public void notifyExecutionStateChanges() {

}
@RpcMethod
public void notifyJobStateChanges() {

}
}
public class AttachedJobClient extends JobWatcherEndpoint {
protected JobClient(RpcService rpcService) {
super(rpcService);
}
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2401: [FLInk-4443][rpc] Add support in RpcCompletenessTe...

2016-08-21 Thread wenlong88
GitHub user wenlong88 opened a pull request:

https://github.com/apache/flink/pull/2401

[FLInk-4443][rpc] Add support in RpcCompletenessTest for inheritance of 
RpcGateway and RpcEndpoint

adding support for RpcCompletenessTest needs to support RpcGateway which is 
composited by some basic functions like following:
```
public interface ExecutionStateListener extends RpcGateway {
public void notifyExecutionStateChanges();
}
public interface JobStateListener extends RpcGateway {
public void notifyJobStateChanges();
}
public interface JobWatcher extends ExecutionStateListener, 
JobStateListener, RpcGateway {

}
public class JobWatcherEndpoint extends RpcEndpoint {
protected JobWatcherEndpoint(RpcService rpcService) {
super(rpcService);
}
@RpcMethod
public void notifyExecutionStateChanges() {

}
@RpcMethod
public void notifyJobStateChanges() {

}
}
public class AttachedJobClient extends JobWatcherEndpoint {
protected JobClient(RpcService rpcService) {
super(rpcService);
}
}
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alibaba/flink flink-4443

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2401


commit 83b46df346d3626aa15f690c449d17fe3218014e
Author: wenlong.lwl 
Date:   2016-08-20T16:46:51Z

add support for rpc gateway and rpc endpoint inheritance

commit b471bd249fbc40a23c6d32e5840c5ab95d693f75
Author: wenlong.lwl 
Date:   2016-08-22T05:33:33Z

update comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-4444) Add a DFSInputChannel and DFSSubPartition

2016-08-21 Thread shuai.xu (JIRA)
shuai.xu created FLINK-:
---

 Summary: Add a DFSInputChannel and DFSSubPartition
 Key: FLINK-
 URL: https://issues.apache.org/jira/browse/FLINK-
 Project: Flink
  Issue Type: Sub-task
  Components: Batch Connectors and Input/Output Formats
Reporter: shuai.xu


Add a new ResultPartitionType and ResultPartitionLocation type for DFS
Add DFSSubpartition and DFSInputChannel for writing and reading DFS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)