[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372558#comment-15372558
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2071
  
No problem, thank you!


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372515#comment-15372515
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2071


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372513#comment-15372513
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2071
  
Sry, I forgot to merge it. Will do now.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372134#comment-15372134
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2071
  
@rmetzger Will you have time to help merge this PR too? I think the changes 
are good to go.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367938#comment-15367938
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2071
  
No problem :)
+1 to merge once travis is green


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367604#comment-15367604
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2071
  
@rmetzger Thanks for the quick review Robert.
Comments are addressed, sorry for the sloppy execution on not validating 
input, should have remembered that :)


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367602#comment-15367602
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r70065967
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumer.java
 ---
@@ -155,6 +163,15 @@ public void run() {
// we can close this consumer thread 
once we've reached the end of the subscribed shard
break;
} else {
+   if (fetchIntervalMillis != 0) {
--- End diff --

`KinesisConfigUtil.validateConfiguration()` will check for a negative 
value, at the client before the tasks are submitted.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367588#comment-15367588
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r70064126
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumer.java
 ---
@@ -155,6 +163,15 @@ public void run() {
// we can close this consumer thread 
once we've reached the end of the subscribed shard
break;
} else {
+   if (fetchIntervalMillis != 0) {
+   if (LOG.isDebugEnabled()) {
+   LOG.debug(
+   "Consumer {} of 
subtask {} is sleeping for {} milliseconds before fetching the next batch of 
records ...",
--- End diff --

Makes sense, I'll remove it!


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367554#comment-15367554
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r70059792
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumer.java
 ---
@@ -155,6 +163,15 @@ public void run() {
// we can close this consumer thread 
once we've reached the end of the subscribed shard
break;
} else {
+   if (fetchIntervalMillis != 0) {
--- End diff --

Yup. Adding the validation to `KinesisConfigUtil.validateConfiguration()`, 
like the other values.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367550#comment-15367550
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r70059642
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumer.java
 ---
@@ -155,6 +163,15 @@ public void run() {
// we can close this consumer thread 
once we've reached the end of the subscribed shard
break;
} else {
+   if (fetchIntervalMillis != 0) {
+   if (LOG.isDebugEnabled()) {
+   LOG.debug(
+   "Consumer {} of 
subtask {} is sleeping for {} milliseconds before fetching the next batch of 
records ...",
--- End diff --

I wonder if this log statement is really necessary. It can lead to quite a 
lot of log entries just for sleeping. (+ there is this other DEBUG log entry on 
each fetch)


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367547#comment-15367547
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r70059586
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumer.java
 ---
@@ -155,6 +163,15 @@ public void run() {
// we can close this consumer thread 
once we've reached the end of the subscribed shard
break;
} else {
+   if (fetchIntervalMillis != 0) {
--- End diff --

There is no input type validation happening, right?
So if a user sets a negative sleeping interval, it will probably fail.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367067#comment-15367067
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2071
  
@rmetzger Rebased + addressed your comment. Please review, thanks :)


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366118#comment-15366118
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2071
  
Great, thank you.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-07-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366114#comment-15366114
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2071
  
I'll try to update this PR tomorrow.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-06-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348502#comment-15348502
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2071
  
Yes, I would suggest to do this PR after the big one.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-06-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348482#comment-15348482
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2071
  
Thank you for reviewing this @rmetzger. Should I address your comment after 
the rework in https://github.com/apache/flink/pull/2131 is merged, and rebase 
this PR on that?


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-06-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348477#comment-15348477
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r68420387
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumerThread.java
 ---
@@ -105,6 +117,8 @@ public void run() {
 
break;
} else {
+   Thread.sleep(idleMillisBetweenFetches);
--- End diff --

Agree. I'll change the default value of this to 0 in 
`KinesisConfigConstants`, and check value here before calling sleep.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-06-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348168#comment-15348168
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2071#discussion_r68386857
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/ShardConsumerThread.java
 ---
@@ -105,6 +117,8 @@ public void run() {
 
break;
} else {
+   Thread.sleep(idleMillisBetweenFetches);
--- End diff --

I'm not sure if its a good idea to introduce this waiting time by default.
Imagine we are consuming slowly because downstream operations are 
expensive. Then, we would introduce an artificial slowdown (in latency and 
throughput). I would suggest to:
- Set the value by default to 0
- Only call sleep if the idle time is > 0.


> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-06-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315316#comment-15315316
 ] 

ASF GitHub Bot commented on FLINK-4018:
---

GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/2071

[FLINK-4018](streaming-connectors) Configurable idle time between 
getRecords requests to Kinesis shards

Along with this new configuration and the already existing 
`KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
control on the desired throughput behaviour for the Kinesis consumer.

The default value for this new configuration is 500 milliseconds idle time.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-4018

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2071


commit 74fcb62f62007c0396d2ae870298f35ae5ce
Author: Gordon Tai 
Date:   2016-06-04T04:43:30Z

[FLINK-4018] Add configuration for idle time between get requests to 
Kinesis shards




> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4018) Configurable idle time between getRecords requests to Kinesis shards

2016-06-03 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315290#comment-15315290
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-4018:


We have a user who has tried out the connector and preparing for production 
next week with Flink & Kinesis. The only thing worrying is this issue. I'll PR 
within the next 24 hours for review.

> Configurable idle time between getRecords requests to Kinesis shards
> 
>
> Key: FLINK-4018
> URL: https://issues.apache.org/jira/browse/FLINK-4018
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>
> Currently, the Kinesis consumer is calling getRecords() right after finishing 
> previous calls. This results in easily reaching Amazon's limitation of 5 GET 
> requests per shard per second. Although the code already has backoff & retry 
> mechanism, this will affect other applications consuming from the same 
> Kinesis stream.
> Along with this new configuration and the already existing 
> `KinesisConfigConstants.CONFIG_SHARD_RECORDS_PER_GET`, users will have more 
> control on the desired throughput behaviour for the Kinesis consumer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)