[jira] [Commented] (BEAM-3093) add an option 'FirstPollOffsetStrategy' to KafkaIO

2017-11-27 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268168#comment-16268168
 ] 

Raghu Angadi commented on BEAM-3093:


[~mingmxu], assigning this to you. Let me know `withStartReadTime()` does not 
do what you are looking for.

> add an option 'FirstPollOffsetStrategy' to KafkaIO
> --
>
> Key: BEAM-3093
> URL: https://issues.apache.org/jira/browse/BEAM-3093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Xu Mingmin
>
> This is a feature borrowed from Storm KafkaSpout.
> *What's the issue?*
> In KafkaIO, when offset is stored either in checkpoint or auto_committed, it 
> cannot be changed in application, to force to read from earliest/latest. 
> --This feature is important to reset the start offset when relaunching a job.
> *Proposed solution:*
> By borrowing the FirstPollOffsetStrategy concept, users can have more options:
> 1). *{{EARLIEST}}*: always start_from_beginning no matter of what's in 
> checkpoint/auto_commit;
> 2). *{{LATEST}}*: always start_from_latest no matter of what's in 
> checkpoint/auto_commit;
> 3). *{{UNCOMMITTED_EARLIEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_beginning if, otherwise start_from_previous_offset;
> 4). *{{UNCOMMITTED_LATEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_latest, otherwise start_from_previous_offset;
> [~rangadi], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3093) add an option 'FirstPollOffsetStrategy' to KafkaIO

2017-10-24 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217318#comment-16217318
 ] 

Raghu Angadi commented on BEAM-3093:


Yes. withStartReadTime().


> add an option 'FirstPollOffsetStrategy' to KafkaIO
> --
>
> Key: BEAM-3093
> URL: https://issues.apache.org/jira/browse/BEAM-3093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Kenneth Knowles
>
> This is a feature borrowed from Storm KafkaSpout.
> *What's the issue?*
> In KafkaIO, when offset is stored either in checkpoint or auto_committed, it 
> cannot be changed in application, to force to read from earliest/latest. 
> --This feature is important to reset the start offset when relaunching a job.
> *Proposed solution:*
> By borrowing the FirstPollOffsetStrategy concept, users can have more options:
> 1). *{{EARLIEST}}*: always start_from_beginning no matter of what's in 
> checkpoint/auto_commit;
> 2). *{{LATEST}}*: always start_from_latest no matter of what's in 
> checkpoint/auto_commit;
> 3). *{{UNCOMMITTED_EARLIEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_beginning if, otherwise start_from_previous_offset;
> 4). *{{UNCOMMITTED_LATEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_latest, otherwise start_from_previous_offset;
> [~rangadi], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3093) add an option 'FirstPollOffsetStrategy' to KafkaIO

2017-10-24 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217240#comment-16217240
 ] 

Raghu Angadi commented on BEAM-3093:


Did you check out `withStartReadOffset()' in KafkaIO? This might be what you 
are looking for for (1) & (2) in when auto_commit is enabled.

I think it is better not to override checkpointed offsets.

> add an option 'FirstPollOffsetStrategy' to KafkaIO
> --
>
> Key: BEAM-3093
> URL: https://issues.apache.org/jira/browse/BEAM-3093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Kenneth Knowles
>
> This is a feature borrowed from Storm KafkaSpout.
> *What's the issue?*
> In KafkaIO, when offset is stored either in checkpoint or auto_committed, it 
> cannot be changed in application, to force to read from earliest/latest. 
> --This feature is important to reset the start offset when relaunching a job.
> *Proposed solution:*
> By borrowing the FirstPollOffsetStrategy concept, users can have more options:
> 1). *{{EARLIEST}}*: always start_from_beginning no matter of what's in 
> checkpoint/auto_commit;
> 2). *{{LATEST}}*: always start_from_latest no matter of what's in 
> checkpoint/auto_commit;
> 3). *{{UNCOMMITTED_EARLIEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_beginning if, otherwise start_from_previous_offset;
> 4). *{{UNCOMMITTED_LATEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_latest, otherwise start_from_previous_offset;
> [~rangadi], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3093) add an option 'FirstPollOffsetStrategy' to KafkaIO

2017-10-23 Thread Xu Mingmin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216282#comment-16216282
 ] 

Xu Mingmin commented on BEAM-3093:
--

Mostly it's about case (1) and (2), but I think it's more clear if we list the 
4 scenarios together in one method.

Reset offset with admin commands may not be easy for developers, especially in 
production environment. (I don't have the permission.)

> add an option 'FirstPollOffsetStrategy' to KafkaIO
> --
>
> Key: BEAM-3093
> URL: https://issues.apache.org/jira/browse/BEAM-3093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Kenneth Knowles
>
> This is a feature borrowed from Storm KafkaSpout.
> *What's the issue?*
> In KafkaIO, when offset is stored either in checkpoint or auto_committed, it 
> cannot be changed in application, to force to read from earliest/latest. 
> --This feature is important to reset the start offset when relaunching a job.
> *Proposed solution:*
> By borrowing the FirstPollOffsetStrategy concept, users can have more options:
> 1). *{{EARLIEST}}*: always start_from_beginning no matter of what's in 
> checkpoint/auto_commit;
> 2). *{{LATEST}}*: always start_from_latest no matter of what's in 
> checkpoint/auto_commit;
> 3). *{{UNCOMMITTED_EARLIEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_beginning if, otherwise start_from_previous_offset;
> 4). *{{UNCOMMITTED_LATEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_latest, otherwise start_from_previous_offset;
> [~rangadi], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3093) add an option 'FirstPollOffsetStrategy' to KafkaIO

2017-10-23 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216107#comment-16216107
 ] 

Raghu Angadi commented on BEAM-3093:


(3) is same as setting "auto.offset.reset" Kafka consumer config to "earliest".
(4) is the default behavior. Same as setting ""auto.offset.reset" config to 
"latest".

It is better not to mix 'checkpointed' offsets and 'auto_committed' offsets. If 
you restart a job scratch, 'checkpointed' offsets are thrown out as well.

(1) & (2) might be useful in the case of 'auto_committed' offsets. User can 
always remove auto_committed offsets in Kafka through admin commands. In that 
sense, (1) is same as '(3) with auto committed offsets reset.'. In fact, 
resetting these offsets using Kafka admin commands gives you much better 
control on where you want to start processing. E.g. you could resume from 24 
hours ago rather then from max retention period for Kafka.




> add an option 'FirstPollOffsetStrategy' to KafkaIO
> --
>
> Key: BEAM-3093
> URL: https://issues.apache.org/jira/browse/BEAM-3093
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Xu Mingmin
>Assignee: Kenneth Knowles
>
> This is a feature borrowed from Storm KafkaSpout.
> *What's the issue?*
> In KafkaIO, when offset is stored either in checkpoint or auto_committed, it 
> cannot be changed in application, to force to read from earliest/latest. 
> --This feature is important to reset the start offset when relaunching a job.
> *Proposed solution:*
> By borrowing the FirstPollOffsetStrategy concept, users can have more options:
> 1). *{{EARLIEST}}*: always start_from_beginning no matter of what's in 
> checkpoint/auto_commit;
> 2). *{{LATEST}}*: always start_from_latest no matter of what's in 
> checkpoint/auto_commit;
> 3). *{{UNCOMMITTED_EARLIEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_beginning if, otherwise start_from_previous_offset;
> 4). *{{UNCOMMITTED_LATEST}}*: if no offset in checkpoint/auto_commit then 
> start_from_latest, otherwise start_from_previous_offset;
> [~rangadi], any comments?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)