[
https://issues.apache.org/jira/browse/FLINK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348327#comment-15348327
]
ASF GitHub Bot commented on FLINK-3231:
---------------------------------------
Github user rmetzger commented on a diff in the pull request:
https://github.com/apache/flink/pull/2131#discussion_r68404148
--- Diff:
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
---
@@ -159,42 +217,65 @@ public String getShardIterator(KinesisStreamShard
shard, String shardIteratorTyp
return kinesisClient.getShardIterator(shard.getStreamName(),
shard.getShardId(), shardIteratorType, startingSeqNum).getShardIterator();
}
+ private List<KinesisStreamShard> getShardsOfStream(String streamName,
String lastSeenShardId) throws InterruptedException {
+ List<KinesisStreamShard> shardsOfStream = new ArrayList<>();
+
+ DescribeStreamResult describeStreamResult;
+ do {
+ describeStreamResult = describeStream(streamName,
lastSeenShardId);
+
+ List<Shard> shards =
describeStreamResult.getStreamDescription().getShards();
+ for (Shard shard : shards) {
+ shardsOfStream.add(new
KinesisStreamShard(streamName, shard));
+ }
+
+ if (shards.size() != 0) {
+ lastSeenShardId = shards.get(shards.size() -
1).getShardId();
+ }
+ } while
(describeStreamResult.getStreamDescription().isHasMoreShards());
+
+ return shardsOfStream;
+ }
+
/**
* Get metainfo for a Kinesis stream, which contains information about
which shards this Kinesis stream possess.
*
+ * This method is using a "full jitter" approach described in
+ * <a
href="http://google.com">https://www.awsarchitectureblog.com/2015/03/backoff.html</a>.
This is necessary
+ * because concurrent calls will be made by all parallel subtask's
{@link ShardDiscoverer}s. This jitter backoff
+ * approach will help distribute calls across the discoverers over time.
+ *
* @param streamName the stream to describe
* @param startShardId which shard to start with for this describe
operation (earlier shard's infos will not appear in result)
* @return the result of the describe stream operation
*/
- private DescribeStreamResult describeStream(String streamName, String
startShardId) {
+ private DescribeStreamResult describeStream(String streamName, String
startShardId) throws InterruptedException {
final DescribeStreamRequest describeStreamRequest = new
DescribeStreamRequest();
describeStreamRequest.setStreamName(streamName);
describeStreamRequest.setExclusiveStartShardId(startShardId);
DescribeStreamResult describeStreamResult = null;
String streamStatus = null;
- int remainingRetryTimes = Integer.valueOf(
-
configProps.getProperty(KinesisConfigConstants.CONFIG_STREAM_DESCRIBE_RETRIES,
Integer.toString(KinesisConfigConstants.DEFAULT_STREAM_DESCRIBE_RETRY_TIMES)));
- long describeStreamBackoffTimeInMillis = Long.valueOf(
-
configProps.getProperty(KinesisConfigConstants.CONFIG_STREAM_DESCRIBE_BACKOFF,
Long.toString(KinesisConfigConstants.DEFAULT_STREAM_DESCRIBE_BACKOFF)));
- // Call DescribeStream, with backoff and retries (if we get
LimitExceededException).
- while ((remainingRetryTimes >= 0) && (describeStreamResult ==
null)) {
+ // Call DescribeStream, with full-jitter backoff (if we get
LimitExceededException).
+ Random seed = null;
+ int attemptCount = 0;
+ while (describeStreamResult == null) { // retry until we get a
result
try {
describeStreamResult =
kinesisClient.describeStream(describeStreamRequest);
streamStatus =
describeStreamResult.getStreamDescription().getStreamStatus();
} catch (LimitExceededException le) {
- LOG.warn("Got LimitExceededException when
describing stream " + streamName + ". Backing off for "
- + describeStreamBackoffTimeInMillis + "
millis.");
- try {
-
Thread.sleep(describeStreamBackoffTimeInMillis);
- } catch (InterruptedException ie) {
- LOG.debug("Stream " + streamName + " :
Sleep was interrupted ", ie);
+ if (seed == null) {
+ seed = new Random();
}
+ long backoffMillis = fullJitterBackoff(
+ describeStreamBaseBackoffMillis,
describeStreamMaxBackoffMillis, describeStreamExpConstant, attemptCount++,
seed);
+ LOG.warn("Got LimitExceededException when
describing stream " + streamName + ". Backing off for "
+ + backoffMillis + " millis.");
+ Thread.sleep(backoffMillis);
} catch (ResourceNotFoundException re) {
throw new RuntimeException("Error while getting
stream details", re);
}
- remainingRetryTimes--;
}
if (streamStatus == null) {
--- End diff --
The `RuntimeException` below has a typo.
`Can't get stream info from ____ after`.
Also, I wonder where the number 3 is coming from.
> Handle Kinesis-side resharding in Kinesis streaming consumer
> ------------------------------------------------------------
>
> Key: FLINK-3231
> URL: https://issues.apache.org/jira/browse/FLINK-3231
> Project: Flink
> Issue Type: Sub-task
> Components: Kinesis Connector, Streaming Connectors
> Affects Versions: 1.1.0
> Reporter: Tzu-Li (Gordon) Tai
> Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> A big difference between Kinesis shards and Kafka partitions is that Kinesis
> users can choose to "merge" and "split" shards at any time for adjustable
> stream throughput capacity. This article explains this quite clearly:
> https://brandur.org/kinesis-by-example.
> This will break the static shard-to-task mapping implemented in the basic
> version of the Kinesis consumer
> (https://issues.apache.org/jira/browse/FLINK-3229). The static shard-to-task
> mapping is done in a simple round-robin-like distribution which can be
> locally determined at each Flink consumer task (Flink Kafka consumer does
> this too).
> To handle Kinesis resharding, we will need some way to let the Flink consumer
> tasks coordinate which shards they are currently handling, and allow the
> tasks to ask the coordinator for a shards reassignment when the task finds
> out it has found a closed shard at runtime (shards will be closed by Kinesis
> when it is merged and split).
> We need a centralized coordinator state store which is visible to all Flink
> consumer tasks. Tasks can use this state store to locally determine what
> shards it can be reassigned. Amazon KCL uses a DynamoDB table for the
> coordination, but as described in
> https://issues.apache.org/jira/browse/FLINK-3211, we unfortunately can't use
> KCL for the implementation of the consumer if we want to leverage Flink's
> checkpointing mechanics. For our own implementation, Zookeeper can be used
> for this state store, but that means it would require the user to set up ZK
> to work.
> Since this feature introduces extensive work, it is opened as a separate
> sub-task from the basic implementation
> https://issues.apache.org/jira/browse/FLINK-3229.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)