[jira] [Created] (FLINK-7598) ineffective shaded artifacts checks in travis_mvn_watchdog.sh

2017-09-07 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-7598:
--

 Summary: ineffective shaded artifacts checks in 
travis_mvn_watchdog.sh
 Key: FLINK-7598
 URL: https://issues.apache.org/jira/browse/FLINK-7598
 Project: Flink
  Issue Type: Bug
  Components: Travis
Affects Versions: 1.4.0
Reporter: Nico Kruber
Assignee: Nico Kruber


The {{check_shaded_artifacts()}} checks have some shortcomings which render 
them (partially) ineffective:

* netty checks use {{wc -1}} but should use {{wc -l}}
* (all) of these checks do not fail if the executed command fails (as can be 
seen from the netty checks that pass without the line counting being correct)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7598) ineffective shaded artifacts checks in travis_mvn_watchdog.sh

2017-09-07 Thread Nico Kruber (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber updated FLINK-7598:
---
Description: 
The {{check_shaded_artifacts()}} checks have some shortcomings which render 
them (partially) ineffective:

* netty checks use {{wc -1}} but should use {{wc -l}}
* (all) of these checks do not fail if the executed command fails (as can be 
seen from the netty checks that pass without the line counting being correct)

In the travis logs, this shows up as
{code}
./tools/travis_mvn_watchdog.sh: line 382: 10052 Terminated  watchdog
wc: invalid option -- '1'
Try 'wc --help' for more information.
./tools/travis_mvn_watchdog.sh: line 297: [: !=: unary operator expected
{code}

  was:
The {{check_shaded_artifacts()}} checks have some shortcomings which render 
them (partially) ineffective:

* netty checks use {{wc -1}} but should use {{wc -l}}
* (all) of these checks do not fail if the executed command fails (as can be 
seen from the netty checks that pass without the line counting being correct)


> ineffective shaded artifacts checks in travis_mvn_watchdog.sh
> -
>
> Key: FLINK-7598
> URL: https://issues.apache.org/jira/browse/FLINK-7598
> Project: Flink
>  Issue Type: Bug
>  Components: Travis
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> The {{check_shaded_artifacts()}} checks have some shortcomings which render 
> them (partially) ineffective:
> * netty checks use {{wc -1}} but should use {{wc -l}}
> * (all) of these checks do not fail if the executed command fails (as can be 
> seen from the netty checks that pass without the line counting being correct)
> In the travis logs, this shows up as
> {code}
> ./tools/travis_mvn_watchdog.sh: line 382: 10052 Terminated  
> watchdog
> wc: invalid option -- '1'
> Try 'wc --help' for more information.
> ./tools/travis_mvn_watchdog.sh: line 297: [: !=: unary operator expected
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7598) ineffective shaded artifacts checks in travis_mvn_watchdog.sh

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156611#comment-16156611
 ] 

ASF GitHub Bot commented on FLINK-7598:
---

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/4653

[FLINK-7598][travis] fix ineffective shaded artifacts checks

## What is the purpose of the change

This fixes the Netty shaded dependencies check and makes all of the checks 
more robust against failures of the executed commands counting the number of 
dependencies.

## Brief change log

* fix `wc -1` call in the netty shaded dependencies check
* make all checks more robust against such failures by including unsafe 
variables in quotes

## Verifying this change

This change is covered by each travis run and it the following does not 
show up in the logs, the problem is fixed:

```
./tools/travis_mvn_watchdog.sh: line 382: 10052 Terminated  
watchdog
wc: invalid option -- '1'
Try 'wc --help' for more information.
./tools/travis_mvn_watchdog.sh: line 297: [: !=: unary operator expected
```

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink flink-7598

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4653


commit 3f88f5dba4d4ec3303671d421f03d78fc7f54440
Author: Nico Kruber 
Date:   2017-09-07T07:19:51Z

[FLINK-7598][travis] fix ineffective shaded artifacts checks

This fixes the netty check and makes all of them more robust against 
failures of
the executed commands counting the number of dependencies.




> ineffective shaded artifacts checks in travis_mvn_watchdog.sh
> -
>
> Key: FLINK-7598
> URL: https://issues.apache.org/jira/browse/FLINK-7598
> Project: Flink
>  Issue Type: Bug
>  Components: Travis
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> The {{check_shaded_artifacts()}} checks have some shortcomings which render 
> them (partially) ineffective:
> * netty checks use {{wc -1}} but should use {{wc -l}}
> * (all) of these checks do not fail if the executed command fails (as can be 
> seen from the netty checks that pass without the line counting being correct)
> In the travis logs, this shows up as
> {code}
> ./tools/travis_mvn_watchdog.sh: line 382: 10052 Terminated  
> watchdog
> wc: invalid option -- '1'
> Try 'wc --help' for more information.
> ./tools/travis_mvn_watchdog.sh: line 297: [: !=: unary operator expected
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4653: [FLINK-7598][travis] fix ineffective shaded artifa...

2017-09-07 Thread NicoK
GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/4653

[FLINK-7598][travis] fix ineffective shaded artifacts checks

## What is the purpose of the change

This fixes the Netty shaded dependencies check and makes all of the checks 
more robust against failures of the executed commands counting the number of 
dependencies.

## Brief change log

* fix `wc -1` call in the netty shaded dependencies check
* make all checks more robust against such failures by including unsafe 
variables in quotes

## Verifying this change

This change is covered by each travis run and it the following does not 
show up in the logs, the problem is fixed:

```
./tools/travis_mvn_watchdog.sh: line 382: 10052 Terminated  
watchdog
wc: invalid option -- '1'
Try 'wc --help' for more information.
./tools/travis_mvn_watchdog.sh: line 297: [: !=: unary operator expected
```

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink flink-7598

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4653


commit 3f88f5dba4d4ec3303671d421f03d78fc7f54440
Author: Nico Kruber 
Date:   2017-09-07T07:19:51Z

[FLINK-7598][travis] fix ineffective shaded artifacts checks

This fixes the netty check and makes all of them more robust against 
failures of
the executed commands counting the number of dependencies.




---


[GitHub] flink issue #4653: [FLINK-7598][travis] fix ineffective shaded artifacts che...

2017-09-07 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4653
  
+1


---


[jira] [Commented] (FLINK-7598) ineffective shaded artifacts checks in travis_mvn_watchdog.sh

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156625#comment-16156625
 ] 

ASF GitHub Bot commented on FLINK-7598:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4653
  
+1


> ineffective shaded artifacts checks in travis_mvn_watchdog.sh
> -
>
> Key: FLINK-7598
> URL: https://issues.apache.org/jira/browse/FLINK-7598
> Project: Flink
>  Issue Type: Bug
>  Components: Travis
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> The {{check_shaded_artifacts()}} checks have some shortcomings which render 
> them (partially) ineffective:
> * netty checks use {{wc -1}} but should use {{wc -l}}
> * (all) of these checks do not fail if the executed command fails (as can be 
> seen from the netty checks that pass without the line counting being correct)
> In the travis logs, this shows up as
> {code}
> ./tools/travis_mvn_watchdog.sh: line 382: 10052 Terminated  
> watchdog
> wc: invalid option -- '1'
> Try 'wc --help' for more information.
> ./tools/travis_mvn_watchdog.sh: line 297: [: !=: unary operator expected
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7596) Fix bug during Set Operation (Union, Minus ... ) with Any(GenericRelDataType)

2017-09-07 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-7596:
-
Description: 
If two inputs with Any(GenericRelDataType), when they comes to Set 
Operation({{UNION}}, {{MINUS}},...), it will cause a {{TableException}} with 
info is "Type is not supported: ANY"
Here is the test case:

{code}
@Test
  def testUnion(): Unit = {
val list = List((1, new NODE), (2, new NODE))
val list2 = List((3, new NODE), (4, new NODE))
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)
val s1 = tEnv.fromDataStream(env.fromCollection(list))
val s2 = tEnv.fromDataStream(env.fromCollection(list2))
val result = s1.unionAll(s2).toAppendStream[Row]
result.addSink(new StreamITCase.StringSink[Row])
env.execute()
  }

  class NODE {
  val x = new util.HashMap[String, String]()
}
{code}

This bug happens because Flink doesn't handle {{createSqlType(ANY)}} and 
Calcite doesn't know the differences between {{ANY}} and 
{{ANY(GenericRelDataType)}}, so the {{createSqlType(ANY)}} of Calcite will 
return a {{BasicSqlType}} instead.

  was:
If two inputs with Any(GenericRelDataType), when they comes to Set 
Operation(Union, minus...), it will cause a {{TableException}} with info is 
"Type is not supported: ANY"
Here is the test case:

`
@Test
  def testUnion(): Unit = {
val list = List((1, new NODE), (2, new NODE))
val list2 = List((3, new NODE), (4, new NODE))
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)
val s1 = tEnv.fromDataStream(env.fromCollection(list))
val s2 = tEnv.fromDataStream(env.fromCollection(list2))
val result = s1.unionAll(s2).toAppendStream[Row]
result.addSink(new StreamITCase.StringSink[Row])
env.execute()
  }

  class NODE {
  val x = new util.HashMap[String, String]()
}
`

this bug happens because flink did't handle createSqlType(ANY) and Calcite 
does't know the differences between {{ANY}} and {{ANY(GenericRelDataType)}}, so 
the {{createSqlType(ANY)}} of Calcite will return a BasicSqlType instead


> Fix bug during Set Operation (Union, Minus ... ) with Any(GenericRelDataType) 
> --
>
> Key: FLINK-7596
> URL: https://issues.apache.org/jira/browse/FLINK-7596
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Reporter: Ruidong Li
>Assignee: Ruidong Li
>
> If two inputs with Any(GenericRelDataType), when they comes to Set 
> Operation({{UNION}}, {{MINUS}},...), it will cause a {{TableException}} with 
> info is "Type is not supported: ANY"
> Here is the test case:
> {code}
> @Test
>   def testUnion(): Unit = {
> val list = List((1, new NODE), (2, new NODE))
> val list2 = List((3, new NODE), (4, new NODE))
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> val tEnv = TableEnvironment.getTableEnvironment(env)
> val s1 = tEnv.fromDataStream(env.fromCollection(list))
> val s2 = tEnv.fromDataStream(env.fromCollection(list2))
> val result = s1.unionAll(s2).toAppendStream[Row]
> result.addSink(new StreamITCase.StringSink[Row])
> env.execute()
>   }
>   class NODE {
>   val x = new util.HashMap[String, String]()
> }
> {code}
> This bug happens because Flink doesn't handle {{createSqlType(ANY)}} and 
> Calcite doesn't know the differences between {{ANY}} and 
> {{ANY(GenericRelDataType)}}, so the {{createSqlType(ANY)}} of Calcite will 
> return a {{BasicSqlType}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7594) Add a SQL CLI client

2017-09-07 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156647#comment-16156647
 ] 

Fabian Hueske commented on FLINK-7594:
--

[~twalthr] and I have been working on a prototype for a few days. Its quite 
basic and reads the catalog from a JSON file. We stream results back to the 
client by letting the query write its output to a Kafka topic and consuming the 
topic in the client.

It would be great to learn more about the AthenaX client. Maybe we can exchange 
some ideas on features and scope.

> Add a SQL CLI client
> 
>
> Key: FLINK-7594
> URL: https://issues.apache.org/jira/browse/FLINK-7594
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment a user can only specify queries within a Java/Scala program 
> which is nice for integrating table programs or parts of it with DataSet or 
> DataStream API. With more connectors coming up, it is time to also provide a 
> programming-free SQL client. The SQL client should consist of a CLI interface 
> and maybe also a REST API. The concrete design is still up for discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4631: [hotfix][kafka][docs] Add warning regarding data losses w...

2017-09-07 Thread pnowojski
Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/4631
  
Thanks :)


---


[GitHub] flink pull request #4631: [hotfix][kafka][docs] Add warning regarding data l...

2017-09-07 Thread pnowojski
Github user pnowojski closed the pull request at:

https://github.com/apache/flink/pull/4631


---


[GitHub] flink issue #4239: [FLINK-6988] flink-connector-kafka-0.11 with exactly-once...

2017-09-07 Thread pnowojski
Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/4239
  
Bugs in tests (those that you can see in fixup commits)


---


[jira] [Commented] (FLINK-6988) Add Apache Kafka 0.11 connector

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1615#comment-1615
 ] 

ASF GitHub Bot commented on FLINK-6988:
---

Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/4239
  
Bugs in tests (those that you can see in fixup commits)


> Add Apache Kafka 0.11 connector
> ---
>
> Key: FLINK-6988
> URL: https://issues.apache.org/jira/browse/FLINK-6988
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.3.1
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
> Fix For: 1.4.0
>
>
> Kafka 0.11 (it will be released very soon) add supports for transactions. 
> Thanks to that, Flink might be able to implement Kafka sink supporting 
> "exactly-once" semantic. API changes and whole transactions support is 
> described in 
> [KIP-98|https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging].
> The goal is to mimic implementation of existing BucketingSink. New 
> FlinkKafkaProducer011 would 
> * upon creation begin transaction, store transaction identifiers into the 
> state and would write all incoming data to an output Kafka topic using that 
> transaction
> * on `snapshotState` call, it would flush the data and write in state 
> information that current transaction is pending to be committed
> * on `notifyCheckpointComplete` we would commit this pending transaction
> * in case of crash between `snapshotState` and `notifyCheckpointComplete` we 
> either abort this pending transaction (if not every participant successfully 
> saved the snapshot) or restore and commit it. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4654: [FLINK-7521] Add config option to set the content ...

2017-09-07 Thread zjureel
GitHub user zjureel opened a pull request:

https://github.com/apache/flink/pull/4654

[FLINK-7521] Add config option to set the content length limit of REST 
server and client

## Brief change log

 Add config option to set the content length limit of REST server and client

## Verifying this change

*(Please pick either of the following options)*

Add config option to set the content length, no test case

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjureel/flink FLINK-7521

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4654


commit 8a8010d05ee7bdfc8c46681cdd453715c5aeac17
Author: zjureel 
Date:   2017-09-07T02:39:39Z

[FLINK-7521] Add config option to set the content length limit of REST 
server and client




---


[jira] [Commented] (FLINK-7521) Remove the 10MB limit from the current REST implementation.

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156683#comment-16156683
 ] 

ASF GitHub Bot commented on FLINK-7521:
---

GitHub user zjureel opened a pull request:

https://github.com/apache/flink/pull/4654

[FLINK-7521] Add config option to set the content length limit of REST 
server and client

## Brief change log

 Add config option to set the content length limit of REST server and client

## Verifying this change

*(Please pick either of the following options)*

Add config option to set the content length, no test case

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjureel/flink FLINK-7521

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4654


commit 8a8010d05ee7bdfc8c46681cdd453715c5aeac17
Author: zjureel 
Date:   2017-09-07T02:39:39Z

[FLINK-7521] Add config option to set the content length limit of REST 
server and client




> Remove the 10MB limit from the current REST implementation.
> ---
>
> Key: FLINK-7521
> URL: https://issues.apache.org/jira/browse/FLINK-7521
> Project: Flink
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 1.4.0
>Reporter: Kostas Kloudas
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.4.0
>
>
> In the current {{AbstractRestServer}} we impose an upper bound of 10MB in the 
> states we can transfer. This is in the line {{.addLast(new 
> HttpObjectAggregator(1024 * 1024 * 10))}} of the server implementation. 
> This limit is restrictive for some of the usecases planned to use this 
> implementation (e.g. the job submission client which has to send full jars, 
> or the queryable state client which may have to receive states bigger than 
> that).
> This issue proposes the elimination of this limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...

2017-09-07 Thread zjureel
Github user zjureel commented on the issue:

https://github.com/apache/flink/pull/4654
  
@kl0u I have tried to fix 
[https://issues.apache.org/jira/browse/FLINK-7521](https://issues.apache.org/jira/browse/FLINK-7521)
 in this PR, could you please have a look when you're free, thanks


---


[jira] [Commented] (FLINK-7521) Remove the 10MB limit from the current REST implementation.

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156686#comment-16156686
 ] 

ASF GitHub Bot commented on FLINK-7521:
---

Github user zjureel commented on the issue:

https://github.com/apache/flink/pull/4654
  
@kl0u I have tried to fix 
[https://issues.apache.org/jira/browse/FLINK-7521](https://issues.apache.org/jira/browse/FLINK-7521)
 in this PR, could you please have a look when you're free, thanks


> Remove the 10MB limit from the current REST implementation.
> ---
>
> Key: FLINK-7521
> URL: https://issues.apache.org/jira/browse/FLINK-7521
> Project: Flink
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 1.4.0
>Reporter: Kostas Kloudas
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.4.0
>
>
> In the current {{AbstractRestServer}} we impose an upper bound of 10MB in the 
> states we can transfer. This is in the line {{.addLast(new 
> HttpObjectAggregator(1024 * 1024 * 10))}} of the server implementation. 
> This limit is restrictive for some of the usecases planned to use this 
> implementation (e.g. the job submission client which has to send full jars, 
> or the queryable state client which may have to receive states bigger than 
> that).
> This issue proposes the elimination of this limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...

2017-09-07 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4654
  
Hi @zjureel ! Thanks for the work!

This can be a temporary fix, but I was thinking more of a long term one 
where there is no limit. 
The problems that I can find with such temporary solution are: 

1) if we add this as a configuration parameter, and then remove it when we 
have a proper fix, this may result in confusion for the users.

2) with this fix, the aggregator is going to allocate the specified limit 
every time there is an element to parse, even though the element may be small. 
This can be a waste of resources.

What do you think @zentol ?



---


[jira] [Commented] (FLINK-7521) Remove the 10MB limit from the current REST implementation.

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156701#comment-16156701
 ] 

ASF GitHub Bot commented on FLINK-7521:
---

Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4654
  
Hi @zjureel ! Thanks for the work!

This can be a temporary fix, but I was thinking more of a long term one 
where there is no limit. 
The problems that I can find with such temporary solution are: 

1) if we add this as a configuration parameter, and then remove it when we 
have a proper fix, this may result in confusion for the users.

2) with this fix, the aggregator is going to allocate the specified limit 
every time there is an element to parse, even though the element may be small. 
This can be a waste of resources.

What do you think @zentol ?



> Remove the 10MB limit from the current REST implementation.
> ---
>
> Key: FLINK-7521
> URL: https://issues.apache.org/jira/browse/FLINK-7521
> Project: Flink
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 1.4.0
>Reporter: Kostas Kloudas
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.4.0
>
>
> In the current {{AbstractRestServer}} we impose an upper bound of 10MB in the 
> states we can transfer. This is in the line {{.addLast(new 
> HttpObjectAggregator(1024 * 1024 * 10))}} of the server implementation. 
> This limit is restrictive for some of the usecases planned to use this 
> implementation (e.g. the job submission client which has to send full jars, 
> or the queryable state client which may have to receive states bigger than 
> that).
> This issue proposes the elimination of this limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4648: [FLINK-7584] [doc] Fix checkstyle version in Setup...

2017-09-07 Thread jameslafa
Github user jameslafa closed the pull request at:

https://github.com/apache/flink/pull/4648


---


[GitHub] flink issue #4648: [FLINK-7584] [doc] Fix checkstyle version in Setup IDE

2017-09-07 Thread jameslafa
Github user jameslafa commented on the issue:

https://github.com/apache/flink/pull/4648
  
As decided, we stay on 6.19. I drop the pull request.


---


[jira] [Commented] (FLINK-7584) [Documentation] Add checkstyle version due to breaking change in the plugin

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156711#comment-16156711
 ] 

ASF GitHub Bot commented on FLINK-7584:
---

Github user jameslafa closed the pull request at:

https://github.com/apache/flink/pull/4648


> [Documentation] Add checkstyle version due to breaking change in the plugin
> ---
>
> Key: FLINK-7584
> URL: https://issues.apache.org/jira/browse/FLINK-7584
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.3.2
>Reporter: James Lafa
>Assignee: James Lafa
>Priority: Trivial
>
> There is a breaking change in the Checkstyle-IDEA plugin since version 8.1. 
> The Flink `checkstyle.xml` is working up to version 8.0. 
> The documentation needs to be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7584) [Documentation] Add checkstyle version due to breaking change in the plugin

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156710#comment-16156710
 ] 

ASF GitHub Bot commented on FLINK-7584:
---

Github user jameslafa commented on the issue:

https://github.com/apache/flink/pull/4648
  
As decided, we stay on 6.19. I drop the pull request.


> [Documentation] Add checkstyle version due to breaking change in the plugin
> ---
>
> Key: FLINK-7584
> URL: https://issues.apache.org/jira/browse/FLINK-7584
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.3.2
>Reporter: James Lafa
>Assignee: James Lafa
>Priority: Trivial
>
> There is a breaking change in the Checkstyle-IDEA plugin since version 8.1. 
> The Flink `checkstyle.xml` is working up to version 8.0. 
> The documentation needs to be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7599) Support aggregation functions in the define and measures clause of MatchRecognize

2017-09-07 Thread Dian Fu (JIRA)
Dian Fu created FLINK-7599:
--

 Summary: Support aggregation functions in the define and measures 
clause of MatchRecognize
 Key: FLINK-7599
 URL: https://issues.apache.org/jira/browse/FLINK-7599
 Project: Flink
  Issue Type: Sub-task
Reporter: Dian Fu
Assignee: Dian Fu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4654: [FLINK-7521] Add config option to set the content length ...

2017-09-07 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4654
  
This doesn't solve the problem of the JIRA. For the job submission we have 
to send jars to the cluster, which may be multiple 100mb large; we obviously 
can't allocate that much memory in general.

As @kl0u said, setting this value any higher (it is already _very_ high) 
will cause insane memory consumption in the object aggregator in any case where 
concurrent requests are in progress.

As such this isn't even a temporary fix, as this is not a viable fix for 
the use-cases that are affected by the limit, so I'm afraid we'll have to 
reject this PR.


---


[jira] [Commented] (FLINK-7521) Remove the 10MB limit from the current REST implementation.

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156740#comment-16156740
 ] 

ASF GitHub Bot commented on FLINK-7521:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4654
  
This doesn't solve the problem of the JIRA. For the job submission we have 
to send jars to the cluster, which may be multiple 100mb large; we obviously 
can't allocate that much memory in general.

As @kl0u said, setting this value any higher (it is already _very_ high) 
will cause insane memory consumption in the object aggregator in any case where 
concurrent requests are in progress.

As such this isn't even a temporary fix, as this is not a viable fix for 
the use-cases that are affected by the limit, so I'm afraid we'll have to 
reject this PR.


> Remove the 10MB limit from the current REST implementation.
> ---
>
> Key: FLINK-7521
> URL: https://issues.apache.org/jira/browse/FLINK-7521
> Project: Flink
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 1.4.0
>Reporter: Kostas Kloudas
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.4.0
>
>
> In the current {{AbstractRestServer}} we impose an upper bound of 10MB in the 
> states we can transfer. This is in the line {{.addLast(new 
> HttpObjectAggregator(1024 * 1024 * 10))}} of the server implementation. 
> This limit is restrictive for some of the usecases planned to use this 
> implementation (e.g. the job submission client which has to send full jars, 
> or the queryable state client which may have to receive states bigger than 
> that).
> This issue proposes the elimination of this limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #3703: [FLINK-5005] WIP: publish scala 2.12 artifacts

2017-09-07 Thread joan38
Github user joan38 commented on the issue:

https://github.com/apache/flink/pull/3703
  
Is there any news on this?


---


[jira] [Commented] (FLINK-5005) Publish Scala 2.12 artifacts

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156747#comment-16156747
 ] 

ASF GitHub Bot commented on FLINK-5005:
---

Github user joan38 commented on the issue:

https://github.com/apache/flink/pull/3703
  
Is there any news on this?


> Publish Scala 2.12 artifacts
> 
>
> Key: FLINK-5005
> URL: https://issues.apache.org/jira/browse/FLINK-5005
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API
>Reporter: Andrew Roberts
>
> Scala 2.12 was [released|http://www.scala-lang.org/news/2.12.0] today, and 
> offers many compile-time and runtime speed improvements. It would be great to 
> get artifacts up on maven central to allow Flink users to migrate to Scala 
> 2.12.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7589) org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 159764230; received: 64638536)

2017-09-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156763#comment-16156763
 ] 

Steve Loughran commented on FLINK-7589:
---

well, you are allowed to file bug reports. 

However, it's not the s3 client getting GC'd, because the s3 client is retained 
for the lifespan of the FileSystem instance, so unless you are disposing of 
that, its retained.

I'd blame network connectivity: the connection was closed, you got back less 
data than you asked for. 

The s3a client does a single retry here, but it could be more sophisticated 
(HADOOP-14531). 

> org.apache.http.ConnectionClosedException: Premature end of Content-Length 
> delimited message body (expected: 159764230; received: 64638536)
> ---
>
> Key: FLINK-7589
> URL: https://issues.apache.org/jira/browse/FLINK-7589
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Bowen Li
>Priority: Blocker
> Fix For: 1.4.0, 1.3.3
>
>
> When I tried to resume a Flink job from a savepoint with different 
> parallelism, I ran into this error. And the resume failed.
> {code:java}
> 2017-09-05 21:53:57,317 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph- returnsivs -> 
> Sink: xxx (7/12) (0da16ec908fc7b9b16a5c2cf1aa92947) switched from RUNNING to 
> FAILED.
> org.apache.http.ConnectionClosedException: Premature end of Content-Length 
> delimited message body (expected: 159764230; received: 64638536
>   at 
> org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:180)
>   at 
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> com.amazonaws.util.ContentLengthValidationInputStream.read(ContentLengthValidationInputStream.java:77)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:72)
>   at 
> org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:61)
>   at 
> org.apache.flink.runtime.util.NonClosingStreamDecorator.read(NonClosingStreamDecorator.java:47)
>   at java.io.DataInputStream.readFully(DataInputStream.java:195)
>   at java.io.DataInputStream.readLong(DataInputStream.java:416)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:68)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimer$TimerSerializer.deserialize(InternalTimer.java:156)
>   at 
> org.apache.flink.streaming.api.operators.HeapInternalTimerService.restoreTimersForKeyGroup(HeapInternalTimerService.java:345)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.restoreStateForKeyGroup(InternalTimeServiceManager.java:141)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:496)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:104)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:251)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7594) Add a SQL CLI client

2017-09-07 Thread Timo Walther (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156775#comment-16156775
 ] 

Timo Walther commented on FLINK-7594:
-

[~wheat9] That sounds great. Maybe you can share some design documents with us?

> Add a SQL CLI client
> 
>
> Key: FLINK-7594
> URL: https://issues.apache.org/jira/browse/FLINK-7594
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment a user can only specify queries within a Java/Scala program 
> which is nice for integrating table programs or parts of it with DataSet or 
> DataStream API. With more connectors coming up, it is time to also provide a 
> programming-free SQL client. The SQL client should consist of a CLI interface 
> and maybe also a REST API. The concrete design is still up for discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7594) Add a SQL CLI client

2017-09-07 Thread Hai Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156954#comment-16156954
 ] 

Hai Zhou commented on FLINK-7594:
-

[~wheat9] That sounds great. +1
BTW,  If Alibaba and Huawei to share their design will be better;)

> Add a SQL CLI client
> 
>
> Key: FLINK-7594
> URL: https://issues.apache.org/jira/browse/FLINK-7594
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment a user can only specify queries within a Java/Scala program 
> which is nice for integrating table programs or parts of it with DataSet or 
> DataStream API. With more connectors coming up, it is time to also provide a 
> programming-free SQL client. The SQL client should consist of a CLI interface 
> and maybe also a REST API. The concrete design is still up for discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4655: [FLINK-7567]: Removed keepPartitioning parameter f...

2017-09-07 Thread mlipkovich
GitHub user mlipkovich opened a pull request:

https://github.com/apache/flink/pull/4655

[FLINK-7567]: Removed keepPartitioning parameter from iterate method

## What is the purpose of the change

Removed parameter keepPartitioning from DataStream#iterate method since 
it's ignored. Also slightly modified error message related to different 
parallelism levels of input and feedback streams

## Brief change log

  - Removed parameter keepPartitioning from DataStream#iterate 

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / no)  no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no) yes
  - The serializers: (yes / no / don't know) no
  - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know) no
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) no

## Documentation

  - Does this pull request introduce a new feature? (yes / no) no
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented) not applicable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mlipkovich/flink FLINK-7567

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4655


commit 2525aef6f65d297142472ae6532e3bddb08df0fd
Author: Mikhail Lipkovich 
Date:   2017-09-07T14:05:22Z

[FLINK-7567]: Removed keepPartitioning parameter from iterate method




---


[jira] [Commented] (FLINK-7567) DataStream#iterate() on env.fromElements() / env.fromCollection() does not work

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156986#comment-16156986
 ] 

ASF GitHub Bot commented on FLINK-7567:
---

GitHub user mlipkovich opened a pull request:

https://github.com/apache/flink/pull/4655

[FLINK-7567]: Removed keepPartitioning parameter from iterate method

## What is the purpose of the change

Removed parameter keepPartitioning from DataStream#iterate method since 
it's ignored. Also slightly modified error message related to different 
parallelism levels of input and feedback streams

## Brief change log

  - Removed parameter keepPartitioning from DataStream#iterate 

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / no)  no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no) yes
  - The serializers: (yes / no / don't know) no
  - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know) no
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) no

## Documentation

  - Does this pull request introduce a new feature? (yes / no) no
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented) not applicable



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mlipkovich/flink FLINK-7567

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4655


commit 2525aef6f65d297142472ae6532e3bddb08df0fd
Author: Mikhail Lipkovich 
Date:   2017-09-07T14:05:22Z

[FLINK-7567]: Removed keepPartitioning parameter from iterate method




> DataStream#iterate() on env.fromElements() / env.fromCollection() does not 
> work
> ---
>
> Key: FLINK-7567
> URL: https://issues.apache.org/jira/browse/FLINK-7567
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.3.2
> Environment: OS X 10.12.6, Oracle JDK 1.8.0_144, Flink 1.3.2
>Reporter: Peter Ertl
>Assignee: Mikhail Lipkovich
>
> When I try to execute this simple snippet of code
> {code}
>   @Test
>   def iterateOnElements(): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> // do something silly just do get iteration going ...
> val result = env.fromElements(1, 2, 3).iterate(it => {
>   (it.filter(_ > 0).map(_ - 1), it.filter(_ > 0).map(_ => 'x'))
> })
> result.print()
> env.execute()
>   }
> {code}
> I get the following exception:
> {code}
> java.lang.UnsupportedOperationException: Parallelism of the feedback stream 
> must match the parallelism of the original stream. Parallelism of original 
> stream: 1; parallelism of feedback stream: 8
>   at 
> org.apache.flink.streaming.api.transformations.FeedbackTransformation.addFeedbackEdge(FeedbackTransformation.java:87)
>   at 
> org.apache.flink.streaming.api.datastream.IterativeStream.closeWith(IterativeStream.java:77)
>   at 
> org.apache.flink.streaming.api.scala.DataStream.iterate(DataStream.scala:519)
>   at atp.analytics.CassandraReadTest.iterate2(CassandraReadTest.scala:134)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$

[jira] [Commented] (FLINK-7567) DataStream#iterate() on env.fromElements() / env.fromCollection() does not work

2017-09-07 Thread Mikhail Lipkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156997#comment-16156997
 ] 

Mikhail Lipkovich commented on FLINK-7567:
--

Hi Peter,
As a user I agree with Aljoscha that we should not do too much things silently. 
Modification of parallelism level can significantly change the performance. 
There can be situations where a user mistakenly had a wrong parallelism level 
of input stream. It's better to get an error message for a user and to make a 
decision by himself/herself. It can be either modification of input stream or 
feedback stream - actually we don't know which of these two we should modify

> DataStream#iterate() on env.fromElements() / env.fromCollection() does not 
> work
> ---
>
> Key: FLINK-7567
> URL: https://issues.apache.org/jira/browse/FLINK-7567
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.3.2
> Environment: OS X 10.12.6, Oracle JDK 1.8.0_144, Flink 1.3.2
>Reporter: Peter Ertl
>Assignee: Mikhail Lipkovich
>
> When I try to execute this simple snippet of code
> {code}
>   @Test
>   def iterateOnElements(): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> // do something silly just do get iteration going ...
> val result = env.fromElements(1, 2, 3).iterate(it => {
>   (it.filter(_ > 0).map(_ - 1), it.filter(_ > 0).map(_ => 'x'))
> })
> result.print()
> env.execute()
>   }
> {code}
> I get the following exception:
> {code}
> java.lang.UnsupportedOperationException: Parallelism of the feedback stream 
> must match the parallelism of the original stream. Parallelism of original 
> stream: 1; parallelism of feedback stream: 8
>   at 
> org.apache.flink.streaming.api.transformations.FeedbackTransformation.addFeedbackEdge(FeedbackTransformation.java:87)
>   at 
> org.apache.flink.streaming.api.datastream.IterativeStream.closeWith(IterativeStream.java:77)
>   at 
> org.apache.flink.streaming.api.scala.DataStream.iterate(DataStream.scala:519)
>   at atp.analytics.CassandraReadTest.iterate2(CassandraReadTest.scala:134)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
> Since is just the simplest iterating stream setup I could imagine this error 
> makes no sense to me :-P



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7508) switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode rather than Per_Request mode

2017-09-07 Thread Bowen Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li updated FLINK-7508:

Description: 
KinesisProducerLibrary (KPL) 0.10.x had been using a One-New-Thread-Per-Request 
model for all requests sent to AWS Kinesis, which is very expensive.

0.12.4 introduced a new [ThreadingMode - 
Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225],
 which will use a thread pool. This hugely improves KPL's performance and 
reduces consumed resources. By default, KPL still uses per-request mode. We 
should explicitly switch FlinkKinesisProducer's KPL threading mode to 'Pooled'.

This work depends on FLINK-7366 and FLINK-7508

Benchmarking I did:

* Environment: Running a Flink hourly-sliding windowing job on 18-node EMR 
cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job 
generates about 21million UserRecords, which means that we generated a test 
load of 21million UserRecords at the first minute of each hour.
* Criteria: Test KPL throughput per minute. Since the default RecordTTL for KPL 
is 30 sec, we can be sure that either all UserRecords are sent by KPL within a 
minute, or we will see UserRecord expiration errors.
* One-New-Thread-Per-Request model: max throughput is about 2million 
UserRecords per min; it doesn't go beyond that because CPU utilization goes to 
100%, everything stopped working and that Flink job crashed.
* Thread-Pool model with pool size of 10: it sends out 21million UserRecords 
within 30 sec without any UserRecord expiration errors. The average peak CPU 
utilization is about 20% - 30%. So 21million UserRecords/min is not the max 
throughput of thread-pool model. We didn't go any further because 1) this 
throughput is already a couple times more than what we really need, and 2) we 
don't have a quick way of increasing the test load

Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. [~tzulitai] 
What do you think





  was:
KinesisProducerLibrary (KPL) 0.10.x had been using a One-New-Thread-Per-Request 
model for all requests sent to AWS Kinesis, which is very expensive.

0.12.4 introduced a new [ThreadingMode - 
Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225],
 which will use a thread pool. This hugely improves KPL's performance and 
reduces consumed resources. By default, KPL still uses per-request mode. We 
should explicitly switch FlinkKinesisProducer's KPL threading mode to 'Pooled'.

This work depends on FLINK-7366 and FLINK-7508

Benchmarking I did:

* Environment: Running a Flink hourly-sliding windowing job on 18-node EMR 
cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job 
generates about 21million UserRecords, which means that we generated a test 
load of 21million UserRecords at the first minute of each hour.
* Criteria: Test KPL throughput per minute. Since the default RecordTTL for KPL 
is 30 sec, we can be sure that either all UserRecords are sent by KPL within a 
minute, or we will see UserRecord expiration errors.
* One-New-Thread-Per-Request model: max throughput is about 2million 
UserRecords per min; it doesn't go beyond that because CPU utilization goes to 
100%, everything stopped working and that Flink job crashed.
* Thread-Pool model: it sends out 21million UserRecords within 30 sec without 
any UserRecord expiration errors. The average peak CPU utilization is about 20% 
- 30%. So 21million UserRecords/min is not the max throughput of thread-pool 
model. We didn't go any further because 1) this throughput is already a couple 
times more than what we really need, and 2) we don't have a quick way of 
increasing the test load

Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. [~tzulitai] 
What do you think






> switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode 
> rather than Per_Request mode
> 
>
> Key: FLINK-7508
> URL: https://issues.apache.org/jira/browse/FLINK-7508
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Critical
> Fix For: 1.4.0
>
>
> KinesisProducerLibrary (KPL) 0.10.x had been using a 
> One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which 
> is very expensive.
> 0.12.4 introduced a new [ThreadingMode - 
> Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/j

[jira] [Updated] (FLINK-7495) AbstractUdfStreamOperator#initializeState() should be called in AsyncWaitOperator#initializeState()

2017-09-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-7495:
--
Description: 
{code}
recoveredStreamElements = context
  .getOperatorStateStore()
  .getListState(new ListStateDescriptor<>(STATE_NAME, 
inStreamElementSerializer));
{code}

Call to AbstractUdfStreamOperator#initializeState() should be added in the 
beginning

  was:
{code}
recoveredStreamElements = context
  .getOperatorStateStore()
  .getListState(new ListStateDescriptor<>(STATE_NAME, 
inStreamElementSerializer));
{code}
Call to AbstractUdfStreamOperator#initializeState() should be added in the 
beginning


> AbstractUdfStreamOperator#initializeState() should be called in 
> AsyncWaitOperator#initializeState()
> ---
>
> Key: FLINK-7495
> URL: https://issues.apache.org/jira/browse/FLINK-7495
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Fang Yong
>Priority: Minor
>
> {code}
> recoveredStreamElements = context
>   .getOperatorStateStore()
>   .getListState(new ListStateDescriptor<>(STATE_NAME, 
> inStreamElementSerializer));
> {code}
> Call to AbstractUdfStreamOperator#initializeState() should be added in the 
> beginning



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4656: [FLINK-7508][kinesis] switch FlinkKinesisProducer ...

2017-09-07 Thread bowenli86
GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/4656

[FLINK-7508][kinesis] switch FlinkKinesisProducer to use KPL's 
ThreadingMode to ThreadedPool mode rather than Per_Request mode

## What is the purpose of the change

KinesisProducerLibrary (KPL) 0.10.x had been using a 
One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which is 
very expensive.

0.12.4 introduced a new ThreadingMode - Pooled, which will use a thread 
pool. This hugely improves KPL's performance and reduces consumed resources. By 
default, KPL still uses per-request mode. We should explicitly switch 
FlinkKinesisProducer's KPL threading mode to 'Pooled'.
This work depends on FLINK-7366 and FLINK-7508

Benchmarking I did:

- Environment: Running a Flink hourly-sliding windowing job on 18-node EMR 
cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job 
generates about 21million UserRecords, which means that we generated a test 
load of 21million UserRecords at the first minute of each hour.
- Criteria: Test KPL throughput per minute. Since the default RecordTTL for 
KPL is 30 sec, we can be sure that either all UserRecords are sent by KPL 
within a minute, or we will see UserRecord expiration errors.
- One-New-Thread-Per-Request model: max throughput is about 2million 
UserRecords per min; it doesn't go beyond that because CPU utilization goes to 
100%, everything stopped working and that Flink job crashed.
- Thread-Pool model with pool size of 10: it sends out 21million 
UserRecords within 30 sec without any UserRecord expiration errors. The average 
peak CPU utilization is about 20% - 30%. So 21million UserRecords/min is not 
the max throughput of thread-pool model. We didn't go any further because 1) 
this throughput is already a couple times more than what we really need, and 2) 
we don't have a quick way of increasing the test load

Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. 

## Brief change log

  - *switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool 
mode rather than Per_Request mode*
  - *update docs*

## Verifying this change

This change added tests and can be verified as follows:

- *added unit tests in 
flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtilTest.java*

## Does this pull request potentially affect one of the following parts:

## Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (docs)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-7508

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4656


commit 6386983239bd3024b395c865ec4fd33e232ca5a3
Author: Bowen Li 
Date:   2017-08-30T16:35:03Z

FLINK-7422 Upgrade Kinesis Client Library (KCL) and AWS SDK in 
flink-connector-kinesis

commit 381cd4156b84673a1d32d2db3f7b2d748d90d980
Author: Bowen Li 
Date:   2017-09-07T06:33:37Z

Merge remote-tracking branch 'upstream/master'

commit 893ec61bebfa20a038819bf1929791e57b98f33b
Author: Bowen Li 
Date:   2017-09-07T20:34:09Z

FLINK-7508 switch FlinkKinesisProducer to use KPL's ThreadingMode to 
threaded-pool mode rather than one_thread_per_request mode




---


[jira] [Commented] (FLINK-7508) switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode rather than Per_Request mode

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157608#comment-16157608
 ] 

ASF GitHub Bot commented on FLINK-7508:
---

GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/4656

[FLINK-7508][kinesis] switch FlinkKinesisProducer to use KPL's 
ThreadingMode to ThreadedPool mode rather than Per_Request mode

## What is the purpose of the change

KinesisProducerLibrary (KPL) 0.10.x had been using a 
One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which is 
very expensive.

0.12.4 introduced a new ThreadingMode - Pooled, which will use a thread 
pool. This hugely improves KPL's performance and reduces consumed resources. By 
default, KPL still uses per-request mode. We should explicitly switch 
FlinkKinesisProducer's KPL threading mode to 'Pooled'.
This work depends on FLINK-7366 and FLINK-7508

Benchmarking I did:

- Environment: Running a Flink hourly-sliding windowing job on 18-node EMR 
cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job 
generates about 21million UserRecords, which means that we generated a test 
load of 21million UserRecords at the first minute of each hour.
- Criteria: Test KPL throughput per minute. Since the default RecordTTL for 
KPL is 30 sec, we can be sure that either all UserRecords are sent by KPL 
within a minute, or we will see UserRecord expiration errors.
- One-New-Thread-Per-Request model: max throughput is about 2million 
UserRecords per min; it doesn't go beyond that because CPU utilization goes to 
100%, everything stopped working and that Flink job crashed.
- Thread-Pool model with pool size of 10: it sends out 21million 
UserRecords within 30 sec without any UserRecord expiration errors. The average 
peak CPU utilization is about 20% - 30%. So 21million UserRecords/min is not 
the max throughput of thread-pool model. We didn't go any further because 1) 
this throughput is already a couple times more than what we really need, and 2) 
we don't have a quick way of increasing the test load

Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. 

## Brief change log

  - *switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool 
mode rather than Per_Request mode*
  - *update docs*

## Verifying this change

This change added tests and can be verified as follows:

- *added unit tests in 
flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtilTest.java*

## Does this pull request potentially affect one of the following parts:

## Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (docs)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-7508

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4656


commit 6386983239bd3024b395c865ec4fd33e232ca5a3
Author: Bowen Li 
Date:   2017-08-30T16:35:03Z

FLINK-7422 Upgrade Kinesis Client Library (KCL) and AWS SDK in 
flink-connector-kinesis

commit 381cd4156b84673a1d32d2db3f7b2d748d90d980
Author: Bowen Li 
Date:   2017-09-07T06:33:37Z

Merge remote-tracking branch 'upstream/master'

commit 893ec61bebfa20a038819bf1929791e57b98f33b
Author: Bowen Li 
Date:   2017-09-07T20:34:09Z

FLINK-7508 switch FlinkKinesisProducer to use KPL's ThreadingMode to 
threaded-pool mode rather than one_thread_per_request mode




> switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode 
> rather than Per_Request mode
> 
>
> Key: FLINK-7508
> URL: https://issues.apache.org/jira/browse/FLINK-7508
> Project: Flink
>  Issue Type: Improvement
>  Components: Kinesis Connector
>Affects Versions: 1.3.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Critical
> Fix For: 1.4.0
>
>
> KinesisProducerLibrary (KPL) 0.10.x had been using a 
> One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which 
> is very expensive.
> 0.12.4 introduced a new [ThreadingMode - 
> Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225],
>  which will use a thread pool. This hugely improves KPL's per

[jira] [Commented] (FLINK-7597) broken flink-connectors-kinesis setup in Intellij that potentially results from improper pom.xml

2017-09-07 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157662#comment-16157662
 ] 

Bowen Li commented on FLINK-7597:
-

Importing that module separately after importing the whole flink project works 
for me

> broken flink-connectors-kinesis setup in Intellij that potentially results 
> from improper pom.xml
> 
>
> Key: FLINK-7597
> URL: https://issues.apache.org/jira/browse/FLINK-7597
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Bowen Li
>Assignee: Bowen Li
>
> I use Intellij to develop flink and flink-connectors-kinesis. I imported the 
> whole flink src code into Intellij, and Intellij treats 
> flink-connectors-kinesis as a module. The project structure in intellij looks 
> like this: https://imgur.com/a/uK3Fd
> Here's the problem: The {{flink-connectors-kinesis}} module always complains 
> about not being able to find dependencies like amazon-kinesis-producer, 
> amazon-kinesis-client, flink-streaming-java_2.11, etc. Seems like Intellij 
> cannot properly parse {{/flink-connectors-kinesis/pom.xml}}. And Intellij 
> always suggest I add those dependencies to {{flink-connectors/pom.xml}}. In 
> short, {{flink-connectors-kinesis}} won't compile in my Intellij until I 
> added those dependencies to {{flink-connectors/pom.xml}}.
> My {{flink-connectors/pom.xml}} file ends up like this all the time:
> {code:java}
> C02SD32LG8WP:flink-connectors Bowen$ git diff
> diff --git a/flink-connectors/pom.xml b/flink-connectors/pom.xml
> index bc3f82f..2b001f5 100644
> --- a/flink-connectors/pom.xml
> +++ b/flink-connectors/pom.xml
> @@ -71,6 +71,16 @@ under the License.
> jsr305
> provided
> 
> +   
> +   com.amazonaws
> +   amazon-kinesis-producer
> +   0.12.5
> +   
> +   
> +   com.amazonaws
> +   amazon-kinesis-client
> +   1.8.1
> +   
> +   
> +   org.apache.flink
> +   flink-streaming-java_2.11
> +   1.4-SNAPSHOT
> +   
> 
> 
> {code}
> FYI, building flink-connectors-kinesis from command line always works. 
> [~tzulitai] Do you use Intellij? If so, how do you properly set up the 
> flink-connectors-kinesis project in Intellij to be able to retrieve 
> dependencies?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-7597) broken flink-connectors-kinesis setup in Intellij that potentially results from improper pom.xml

2017-09-07 Thread Bowen Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li closed FLINK-7597.
---
Resolution: Not A Problem

> broken flink-connectors-kinesis setup in Intellij that potentially results 
> from improper pom.xml
> 
>
> Key: FLINK-7597
> URL: https://issues.apache.org/jira/browse/FLINK-7597
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Bowen Li
>Assignee: Bowen Li
>
> I use Intellij to develop flink and flink-connectors-kinesis. I imported the 
> whole flink src code into Intellij, and Intellij treats 
> flink-connectors-kinesis as a module. The project structure in intellij looks 
> like this: https://imgur.com/a/uK3Fd
> Here's the problem: The {{flink-connectors-kinesis}} module always complains 
> about not being able to find dependencies like amazon-kinesis-producer, 
> amazon-kinesis-client, flink-streaming-java_2.11, etc. Seems like Intellij 
> cannot properly parse {{/flink-connectors-kinesis/pom.xml}}. And Intellij 
> always suggest I add those dependencies to {{flink-connectors/pom.xml}}. In 
> short, {{flink-connectors-kinesis}} won't compile in my Intellij until I 
> added those dependencies to {{flink-connectors/pom.xml}}.
> My {{flink-connectors/pom.xml}} file ends up like this all the time:
> {code:java}
> C02SD32LG8WP:flink-connectors Bowen$ git diff
> diff --git a/flink-connectors/pom.xml b/flink-connectors/pom.xml
> index bc3f82f..2b001f5 100644
> --- a/flink-connectors/pom.xml
> +++ b/flink-connectors/pom.xml
> @@ -71,6 +71,16 @@ under the License.
> jsr305
> provided
> 
> +   
> +   com.amazonaws
> +   amazon-kinesis-producer
> +   0.12.5
> +   
> +   
> +   com.amazonaws
> +   amazon-kinesis-client
> +   1.8.1
> +   
> +   
> +   org.apache.flink
> +   flink-streaming-java_2.11
> +   1.4-SNAPSHOT
> +   
> 
> 
> {code}
> FYI, building flink-connectors-kinesis from command line always works. 
> [~tzulitai] Do you use Intellij? If so, how do you properly set up the 
> flink-connectors-kinesis project in Intellij to be able to retrieve 
> dependencies?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7600) shorten delay of KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid updateCredentials Exception

2017-09-07 Thread Bowen Li (JIRA)
Bowen Li created FLINK-7600:
---

 Summary: shorten delay of 
KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid 
updateCredentials Exception
 Key: FLINK-7600
 URL: https://issues.apache.org/jira/browse/FLINK-7600
 Project: Flink
  Issue Type: Bug
  Components: Kinesis Connector
Affects Versions: 1.3.2
Reporter: Bowen Li
Assignee: Bowen Li
Priority: Minor
 Fix For: 1.4.0, 1.3.3


we saw the following warning in Flink log:


{code:java}
2017-08-11 02:33:24,473 WARN  
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon  
- Exception during updateCredentials
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon$5.run(Daemon.java:316)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

According to discussion in 
https://github.com/awslabs/amazon-kinesis-producer/issues/10, setting the delay 
to 100 will fix this issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4657: [FLINK-7600][kinesis] shorten delay of KinesisProd...

2017-09-07 Thread bowenli86
GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/4657

[FLINK-7600][kinesis] shorten delay of 
KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid 
updateCredentials Exception

## What is the purpose of the change

we saw the following warning in Flink log:

```
2017-08-11 02:33:24,473 WARN  
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon  
- Exception during updateCredentials
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon$5.run(Daemon.java:316)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

According to discussion in 
https://github.com/awslabs/amazon-kinesis-producer/issues/10, setting the delay 
to 100 will fix this issue

## Brief change log

  - *shorten aws credentials refresh delay to 100 millisec*

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

I've been running the fixed flink-connector-kinesis for a few days on AWS, 
and that warning log never happens again

## Does this pull request potentially affect one of the following parts:

## Documentation

  - Does this pull request introduce a new feature? (no)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-7600

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4657.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4657


commit 6386983239bd3024b395c865ec4fd33e232ca5a3
Author: Bowen Li 
Date:   2017-08-30T16:35:03Z

FLINK-7422 Upgrade Kinesis Client Library (KCL) and AWS SDK in 
flink-connector-kinesis

commit 381cd4156b84673a1d32d2db3f7b2d748d90d980
Author: Bowen Li 
Date:   2017-09-07T06:33:37Z

Merge remote-tracking branch 'upstream/master'

commit 38e8654142939061aa9595d67311619c9eb759ec
Author: Bowen Li 
Date:   2017-09-07T21:36:28Z

FLINK-7600 shorten delay of 
KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid 
updateCredentials Exception




---


[jira] [Commented] (FLINK-7600) shorten delay of KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid updateCredentials Exception

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157785#comment-16157785
 ] 

ASF GitHub Bot commented on FLINK-7600:
---

GitHub user bowenli86 opened a pull request:

https://github.com/apache/flink/pull/4657

[FLINK-7600][kinesis] shorten delay of 
KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid 
updateCredentials Exception

## What is the purpose of the change

we saw the following warning in Flink log:

```
2017-08-11 02:33:24,473 WARN  
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon  
- Exception during updateCredentials
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon$5.run(Daemon.java:316)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

According to discussion in 
https://github.com/awslabs/amazon-kinesis-producer/issues/10, setting the delay 
to 100 will fix this issue

## Brief change log

  - *shorten aws credentials refresh delay to 100 millisec*

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

I've been running the fixed flink-connector-kinesis for a few days on AWS, 
and that warning log never happens again

## Does this pull request potentially affect one of the following parts:

## Documentation

  - Does this pull request introduce a new feature? (no)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bowenli86/flink FLINK-7600

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4657.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4657


commit 6386983239bd3024b395c865ec4fd33e232ca5a3
Author: Bowen Li 
Date:   2017-08-30T16:35:03Z

FLINK-7422 Upgrade Kinesis Client Library (KCL) and AWS SDK in 
flink-connector-kinesis

commit 381cd4156b84673a1d32d2db3f7b2d748d90d980
Author: Bowen Li 
Date:   2017-09-07T06:33:37Z

Merge remote-tracking branch 'upstream/master'

commit 38e8654142939061aa9595d67311619c9eb759ec
Author: Bowen Li 
Date:   2017-09-07T21:36:28Z

FLINK-7600 shorten delay of 
KinesisProducerConfiguration.setCredentialsRefreshDelay() to avoid 
updateCredentials Exception




> shorten delay of KinesisProducerConfiguration.setCredentialsRefreshDelay() to 
> avoid updateCredentials Exception
> ---
>
> Key: FLINK-7600
> URL: https://issues.apache.org/jira/browse/FLINK-7600
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.3.2
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.4.0, 1.3.3
>
>
> we saw the following warning in Flink log:
> {code:java}
> 2017-08-11 02:33:24,473 WARN  
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon
>   - Exception during updateCredentials
> java.lang.InterruptedException: sleep interrupted
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.producer.Daemon$5.run(Daemon.java:316)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> According to discussion in 
> https://github.com/awslabs/amazon-kinesis-producer/issues/10, setting the 
> delay to 100 will fix this issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4638: [FLINK-6563] [table] Add time indicator support to...

2017-09-07 Thread haohui
Github user haohui commented on a diff in the pull request:

https://github.com/apache/flink/pull/4638#discussion_r137671151
  
--- Diff: 
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaTableSource.java
 ---
@@ -106,8 +240,191 @@
return deserializationSchema;
}
 
-   @Override
-   public String explainSource() {
-   return "";
+   /**
+* Assigns ingestion time timestamps and watermarks.
+*/
+   public static class IngestionTimeWatermarkAssigner implements 
AssignerWithPeriodicWatermarks {
+
+   private long curTime = Long.MIN_VALUE;
+
+   @Override
+   public long extractTimestamp(Row element, long 
previousElementTimestamp) {
+   long t = System.currentTimeMillis();
+   if (t > curTime) {
+   curTime = t;
+   }
+   return curTime;
+   }
+
+   @Nullable
+   @Override
+   public Watermark getCurrentWatermark() {
+   return new Watermark(curTime - 1);
+   }
+   }
+
+   protected AssignerWithPeriodicWatermarks getAssigner() {
+   return this.timestampAssigner;
+   }
+
+   /**
+* Checks that the provided row time attribute is valid, determines its 
position in the schema,
+* and adjusts the return type.
+*
+* @param rowtime The attribute to check.
+*/
+   private void configureRowTimeAttribute(String rowtime) {
+   Preconditions.checkNotNull(rowtime, "Row time attribute must 
not be null.");
+
+   if (this.ingestionTimeAttribute != null) {
+   throw new ValidationException(
+   "You can only specify a row time attribute OR 
an ingestion time attribute.");
+   }
+
+   if (this.rowTimeAttribute != null) {
+   throw new ValidationException(
+   "Row time attribute can only be specified 
once.");
+   }
+
+   // get current fields
+   String[] fieldNames = ((RowTypeInfo) 
this.getReturnType()).getFieldNames();
+   TypeInformation[] fieldTypes = ((RowTypeInfo) 
this.getReturnType()).getFieldTypes();
+
+   // check if the rowtime field exists and remember position
+   this.rowtimeFieldPos = -1;
--- End diff --

I'm wondering why we need to remove the field here and add it back later 
on. Changing the orders of the fields seems problematic and can potentially 
break serialization (in very hacky cases).

Another question is that to which extent a customized timestamp assigner 
can reuse the code here? Is it possible to implement it as a decorator of the 
table source? That way it opens up the possibilities to reuse the code for 
other table sources.


---


[jira] [Commented] (FLINK-6563) Expose time indicator attributes in the KafkaTableSource

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157798#comment-16157798
 ] 

ASF GitHub Bot commented on FLINK-6563:
---

Github user haohui commented on a diff in the pull request:

https://github.com/apache/flink/pull/4638#discussion_r137671151
  
--- Diff: 
flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/KafkaTableSource.java
 ---
@@ -106,8 +240,191 @@
return deserializationSchema;
}
 
-   @Override
-   public String explainSource() {
-   return "";
+   /**
+* Assigns ingestion time timestamps and watermarks.
+*/
+   public static class IngestionTimeWatermarkAssigner implements 
AssignerWithPeriodicWatermarks {
+
+   private long curTime = Long.MIN_VALUE;
+
+   @Override
+   public long extractTimestamp(Row element, long 
previousElementTimestamp) {
+   long t = System.currentTimeMillis();
+   if (t > curTime) {
+   curTime = t;
+   }
+   return curTime;
+   }
+
+   @Nullable
+   @Override
+   public Watermark getCurrentWatermark() {
+   return new Watermark(curTime - 1);
+   }
+   }
+
+   protected AssignerWithPeriodicWatermarks getAssigner() {
+   return this.timestampAssigner;
+   }
+
+   /**
+* Checks that the provided row time attribute is valid, determines its 
position in the schema,
+* and adjusts the return type.
+*
+* @param rowtime The attribute to check.
+*/
+   private void configureRowTimeAttribute(String rowtime) {
+   Preconditions.checkNotNull(rowtime, "Row time attribute must 
not be null.");
+
+   if (this.ingestionTimeAttribute != null) {
+   throw new ValidationException(
+   "You can only specify a row time attribute OR 
an ingestion time attribute.");
+   }
+
+   if (this.rowTimeAttribute != null) {
+   throw new ValidationException(
+   "Row time attribute can only be specified 
once.");
+   }
+
+   // get current fields
+   String[] fieldNames = ((RowTypeInfo) 
this.getReturnType()).getFieldNames();
+   TypeInformation[] fieldTypes = ((RowTypeInfo) 
this.getReturnType()).getFieldTypes();
+
+   // check if the rowtime field exists and remember position
+   this.rowtimeFieldPos = -1;
--- End diff --

I'm wondering why we need to remove the field here and add it back later 
on. Changing the orders of the fields seems problematic and can potentially 
break serialization (in very hacky cases).

Another question is that to which extent a customized timestamp assigner 
can reuse the code here? Is it possible to implement it as a decorator of the 
table source? That way it opens up the possibilities to reuse the code for 
other table sources.


> Expose time indicator attributes in the KafkaTableSource
> 
>
> Key: FLINK-6563
> URL: https://issues.apache.org/jira/browse/FLINK-6563
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Critical
> Fix For: 1.4.0
>
>
> This is a follow up for FLINK-5884.
> After FLINK-5884 requires the {{TableSource}} interfaces to expose the 
> processing time and the event time for the data stream. This jira proposes to 
> expose these two information in the Kafka table source.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6125) Commons httpclient is not shaded anymore in Flink 1.2

2017-09-07 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157858#comment-16157858
 ] 

Bowen Li commented on FLINK-6125:
-

[~tzulitai] I agree, I think this is different from FLINK-6951.

> Commons httpclient is not shaded anymore in Flink 1.2
> -
>
> Key: FLINK-6125
> URL: https://issues.apache.org/jira/browse/FLINK-6125
> Project: Flink
>  Issue Type: Bug
>  Components: Build System, Kinesis Connector
>Reporter: Robert Metzger
>Priority: Critical
>
> This has been reported by a user: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Return-of-Flink-shading-problems-in-1-2-0-td12257.html
> The Kinesis connector requires Flink to not expose any httpclient 
> dependencies. Since Flink 1.2 it seems that we are exposing that dependency 
> again



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (FLINK-6703) Document how to take a savepoint on YARN

2017-09-07 Thread Bowen Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li reassigned FLINK-6703:
---

Assignee: Bowen Li

> Document how to take a savepoint on YARN
> 
>
> Key: FLINK-6703
> URL: https://issues.apache.org/jira/browse/FLINK-6703
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Bowen Li
>
> The documentation should have a separate entry for savepoint related CLI 
> commands in combination with YARN. It is currently not documented that you 
> have to supply the application id, nor how you can pass it.
> {code}
> ./bin/flink savepoint  -m yarn-cluster (-yid|-yarnapplicationId) 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6703) Document how to take a savepoint on YARN

2017-09-07 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157873#comment-16157873
 ] 

Bowen Li commented on FLINK-6703:
-

[~Zentol] is this still valid? is the above savepoint command for Flink job 
running in yarn-session mode?

One thing I found missing in the documentation is the following commond example:

{code:java}
$ bin/flink savepoint :jobId [-m jobManagerAddress:jobManagerPort] 
[:targetDirectory]
{code}


> Document how to take a savepoint on YARN
> 
>
> Key: FLINK-6703
> URL: https://issues.apache.org/jira/browse/FLINK-6703
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Bowen Li
>
> The documentation should have a separate entry for savepoint related CLI 
> commands in combination with YARN. It is currently not documented that you 
> have to supply the application id, nor how you can pass it.
> {code}
> ./bin/flink savepoint  -m yarn-cluster (-yid|-yarnapplicationId) 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (FLINK-6703) Document how to take a savepoint on YARN

2017-09-07 Thread Bowen Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157873#comment-16157873
 ] 

Bowen Li edited comment on FLINK-6703 at 9/7/17 11:49 PM:
--

[~Zentol] is this still valid? is the above savepoint command for Flink job 
running in yarn-session mode?

One thing I found missing in the documentation is the following commond example:

{code:java}
$ bin/flink savepoint :jobId [-m jobManagerAddress:jobManagerPort] 
[:targetDirectory]
{code}

I've been using the above command to take savepoints in YARN all the time


was (Author: phoenixjiangnan):
[~Zentol] is this still valid? is the above savepoint command for Flink job 
running in yarn-session mode?

One thing I found missing in the documentation is the following commond example:

{code:java}
$ bin/flink savepoint :jobId [-m jobManagerAddress:jobManagerPort] 
[:targetDirectory]
{code}


> Document how to take a savepoint on YARN
> 
>
> Key: FLINK-6703
> URL: https://issues.apache.org/jira/browse/FLINK-6703
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Bowen Li
>
> The documentation should have a separate entry for savepoint related CLI 
> commands in combination with YARN. It is currently not documented that you 
> have to supply the application id, nor how you can pass it.
> {code}
> ./bin/flink savepoint  -m yarn-cluster (-yid|-yarnapplicationId) 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (FLINK-5095) Add explicit notifyOfAddedX methods to MetricReporter interface

2017-09-07 Thread Bowen Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li reassigned FLINK-5095:
---

Assignee: Bowen Li

> Add explicit notifyOfAddedX methods to MetricReporter interface
> ---
>
> Key: FLINK-5095
> URL: https://issues.apache.org/jira/browse/FLINK-5095
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.1.3
>Reporter: Chesnay Schepler
>Assignee: Bowen Li
>Priority: Minor
>
> I would like to start a discussion on the MetricReporter interface, 
> specifically the methods that notify a reporter of added or removed metrics.
> Currently, the methods are defined as follows:
> {code}
> void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group);
> void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup 
> group);
> {code}
> All metrics, regardless of their actual type, are passed to the reporter with 
> these methods.
> Since the different metric types have to be handled differently we thus force 
> every reporter to do something like this:
> {code}
> if (metric instanceof Counter) {
> Counter c = (Counter) metric;
>   // deal with counter
> } else if (metric instanceof Gauge) {
>   // deal with gauge
> } else if (metric instanceof Histogram) {
>   // deal with histogram
> } else if (metric instanceof Meter) {
>   // deal with meter
> } else {
>   // log something or throw an exception
> }
> {code}
> This has a few issues
> * the instanceof checks and castings are unnecessary overhead
> * it requires the implementer to be aware of every metric type
> * it encourages throwing an exception in the final else block
> We could remedy all of these by reworking the interface to contain explicit 
> add/remove methods for every metric type. This would however be a breaking 
> change and blow up the interface to 12 methods from the current 4. We could 
> also add a RichMetricReporter interface with these methods, which would 
> require relatively little changes but add additional complexity.
> I was wondering what other people think about this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (FLINK-6549) Improve error message for type mismatches with side outputs

2017-09-07 Thread Bowen Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li reassigned FLINK-6549:
---

Assignee: Bowen Li

> Improve error message for type mismatches with side outputs
> ---
>
> Key: FLINK-6549
> URL: https://issues.apache.org/jira/browse/FLINK-6549
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Bowen Li
>Priority: Minor
>
> A type mismatch when using side outputs causes a ClassCastException to be 
> thrown. It would be neat to include the name of the OutputTags in the 
> exception message.
> This can occur when multiple {{OutputTag]}s with different types but 
> identical names are being used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (FLINK-6957) WordCountTable example cannot be run

2017-09-07 Thread Bowen Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bowen Li reassigned FLINK-6957:
---

Assignee: Bowen Li

> WordCountTable example cannot be run
> 
>
> Key: FLINK-6957
> URL: https://issues.apache.org/jira/browse/FLINK-6957
> Project: Flink
>  Issue Type: Bug
>  Components: Examples, Table API & SQL
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Bowen Li
> Fix For: 1.4.0
>
>
> Running the example (with the fix for FLINK-6956 applied) gives the following 
> exception:
> {code}
> Table program cannot be compiled. This is a bug. Please file an issue.
> 
> org.apache.flink.table.codegen.Compiler$class.compile(Compiler.scala:36)
> org.apache.flink.table.runtime.MapRunner.compile(MapRunner.scala:28)
> org.apache.flink.table.runtime.MapRunner.open(MapRunner.scala:42)
> 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 
> org.apache.flink.api.common.operators.base.MapOperatorBase.executeOnCollections(MapOperatorBase.java:64)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:250)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:148)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:228)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:148)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:228)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:148)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:228)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:148)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:228)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:148)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:130)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:181)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:157)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:130)
> 
> org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:114)
> 
> org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:35)
> 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)
> org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
> org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
> 
> org.apache.flink.table.examples.java.WordCountTable.main(WordCountTable.java:58)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4655: [FLINK-7567]: Removed keepPartitioning parameter from ite...

2017-09-07 Thread mlipkovich
Github user mlipkovich commented on the issue:

https://github.com/apache/flink/pull/4655
  
As I understand the build has failed because of changed API method. The 
method which was changed has annotation PublicEvolving so there should be a way 
to change it. As was mentioned by @aljoscha 
https://issues.apache.org/jira/browse/FLINK-7567 there is no way to create 
a method with updated API and to deprecate the current one because of the 
default parameter


---


[jira] [Commented] (FLINK-7567) DataStream#iterate() on env.fromElements() / env.fromCollection() does not work

2017-09-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157885#comment-16157885
 ] 

ASF GitHub Bot commented on FLINK-7567:
---

Github user mlipkovich commented on the issue:

https://github.com/apache/flink/pull/4655
  
As I understand the build has failed because of changed API method. The 
method which was changed has annotation PublicEvolving so there should be a way 
to change it. As was mentioned by @aljoscha 
https://issues.apache.org/jira/browse/FLINK-7567 there is no way to create 
a method with updated API and to deprecate the current one because of the 
default parameter


> DataStream#iterate() on env.fromElements() / env.fromCollection() does not 
> work
> ---
>
> Key: FLINK-7567
> URL: https://issues.apache.org/jira/browse/FLINK-7567
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.3.2
> Environment: OS X 10.12.6, Oracle JDK 1.8.0_144, Flink 1.3.2
>Reporter: Peter Ertl
>Assignee: Mikhail Lipkovich
>
> When I try to execute this simple snippet of code
> {code}
>   @Test
>   def iterateOnElements(): Unit = {
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> // do something silly just do get iteration going ...
> val result = env.fromElements(1, 2, 3).iterate(it => {
>   (it.filter(_ > 0).map(_ - 1), it.filter(_ > 0).map(_ => 'x'))
> })
> result.print()
> env.execute()
>   }
> {code}
> I get the following exception:
> {code}
> java.lang.UnsupportedOperationException: Parallelism of the feedback stream 
> must match the parallelism of the original stream. Parallelism of original 
> stream: 1; parallelism of feedback stream: 8
>   at 
> org.apache.flink.streaming.api.transformations.FeedbackTransformation.addFeedbackEdge(FeedbackTransformation.java:87)
>   at 
> org.apache.flink.streaming.api.datastream.IterativeStream.closeWith(IterativeStream.java:77)
>   at 
> org.apache.flink.streaming.api.scala.DataStream.iterate(DataStream.scala:519)
>   at atp.analytics.CassandraReadTest.iterate2(CassandraReadTest.scala:134)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
> Since is just the simplest iterating stream setup I could imagine this error 
> makes no sense to me :-P



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)