[jira] [Updated] (KAFKA-3478) Finer Stream Flow Control

2017-01-31 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3478:
-
Fix Version/s: (was: 0.10.2.0)

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3478) Finer Stream Flow Control

2016-09-12 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3478:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.2.0

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.2.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3478) Finer Stream Flow Control

2016-07-15 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-3478:
---
Description: 
Today we have a event-time based flow control mechanism in order to synchronize 
multiple input streams in a best effort manner:

http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps

However, there are some use cases where users would like to have finer control 
of the input streams, for example, with two input streams, one of them always 
reading from offset 0 upon (re)-starting, and the other reading for log end 
offset.

Today we only have one consumer config "offset.auto.reset" to control that 
behavior, which means all streams are read either from "earliest" or "latest".

We should consider how to improve this settings to allow users have finer 
control over these frameworks.

=
A finer flow control could also be used to allow for populating a {{KTable}} 
(with an "initial" state) before starting the actual processing (this feature 
was ask for in the mailing list multiple times already). Even if it is quite 
hard to define, *when* the initial populating phase should end, this might 
still be useful. There would be the following possibilities:

 1) an initial fixed time period for populating
   (it might be hard for a user to estimate the correct value)
 2) an "idle" period, ie, if no update to a KTable for a certain time is
done, we consider it as populated
 3) a timestamp cut off point, ie, all records with an older timestamp
belong to the initial populating phase
 4) a throughput threshold, ie, if the populating frequency falls below
the threshold, the KTable is considered "finished"
 5) maybe something else ??

The API might look something like this
{noformat}
KTable table = builder.table("topic", 1000); // populate the table without 
reading any other topics until see one record with timestamp 1000.
{noformat}

  was:
Today we have a event-time based flow control mechanism in order to synchronize 
multiple input streams in a best effort manner:

http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps

However, there are some use cases where users would like to have finer control 
of the input streams, for example, with two input streams, one of them always 
reading from offset 0 upon (re)-starting, and the other reading for log end 
offset.

Today we only have one consumer config "offset.auto.reset" to control that 
behavior, which means all streams are read either from "earliest" or "latest".

We should consider how to improve this settings to allow users have finer 
control over these frameworks.


> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.
> =
> A finer flow control could also be used to allow for populating a {{KTable}} 
> (with an "initial" state) before starting the actual processing (this feature 
> was ask for in the mailing list multiple times already). Even if it is quite 
> hard to define, *when* the initial populating phase should end, this might 
> still be useful. There would be the following possibilities:
>  1) an initial fixed time period for populating
>(it might be hard for a user to estimate the correct value)
>  2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
>  3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
>  4) a throughput threshold, ie, if the populating frequency falls below
> the threshold, the KTable is considered "finished"
>  5) maybe something else ??
> The API might look something like this
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without 
> reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v

[jira] [Updated] (KAFKA-3478) Finer Stream Flow Control

2016-07-05 Thread Michael Noll (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Noll updated KAFKA-3478:

Description: 
Today we have a event-time based flow control mechanism in order to synchronize 
multiple input streams in a best effort manner:

http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps

However, there are some use cases where users would like to have finer control 
of the input streams, for example, with two input streams, one of them always 
reading from offset 0 upon (re)-starting, and the other reading for log end 
offset.

Today we only have one consumer config "offset.auto.reset" to control that 
behavior, which means all streams are read either from "earliest" or "latest".

We should consider how to improve this settings to allow users have finer 
control over these frameworks.

  was:
Today we have a event-time based flow control mechanism in order to synchronize 
multiple input streams in a best effort manner:

http://docs.confluent.io/2.1.0-alpha1/streams/architecture.html#flow-control-with-timestamps

However, there are some use cases where users would like to have finer control 
of the input streams, for example, with two input streams, one of them always 
reading from offset 0 upon (re)-starting, and the other reading for log end 
offset.

Today we only have one consumer config "offset.auto.reset" to control that 
behavior, which means all streams are read either from "earliest" or "latest".

We should consider how to improve this settings to allow users have finer 
control over these frameworks.


> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3478) Finer Stream Flow Control

2016-04-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3478:
-
Labels: user-experience  (was: )

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/2.1.0-alpha1/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3478) Finer Stream Flow Control

2016-04-04 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3478:
-
Component/s: kafka streams

> Finer Stream Flow Control
> -
>
> Key: KAFKA-3478
> URL: https://issues.apache.org/jira/browse/KAFKA-3478
> Project: Kafka
>  Issue Type: Sub-task
>  Components: kafka streams
>Reporter: Guozhang Wang
> Fix For: 0.10.1.0
>
>
> Today we have a event-time based flow control mechanism in order to 
> synchronize multiple input streams in a best effort manner:
> http://docs.confluent.io/2.1.0-alpha1/streams/architecture.html#flow-control-with-timestamps
> However, there are some use cases where users would like to have finer 
> control of the input streams, for example, with two input streams, one of 
> them always reading from offset 0 upon (re)-starting, and the other reading 
> for log end offset.
> Today we only have one consumer config "offset.auto.reset" to control that 
> behavior, which means all streams are read either from "earliest" or "latest".
> We should consider how to improve this settings to allow users have finer 
> control over these frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)