[jira] [Created] (KAFKA-6701) synchronize Log modification between delete cleanup and async delete

2018-03-21 Thread Sumant Tambe (JIRA)
Sumant Tambe created KAFKA-6701:
---

 Summary: synchronize Log modification between delete cleanup and 
async delete
 Key: KAFKA-6701
 URL: https://issues.apache.org/jira/browse/KAFKA-6701
 Project: Kafka
  Issue Type: Bug
Reporter: Sumant Tambe
Assignee: Sumant Tambe


Kafka broker crashes without any evident disk failures 

>From [~becket_qin]: This looks a bug in kafka when topic deletion and log 
>retention cleanup happen at the same time, the log retention cleanup may see 
>ClosedChannelException after the log has been renamed for async deletion.

The root cause is that the topic deletion should have set the isClosed flag of 
the partition log to true and the retention should not bother to do the old log 
segments deletion when the log is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-5886) Introduce delivery.timeout.ms producer config (KIP-91)

2017-10-02 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188707#comment-16188707
 ] 

Sumant Tambe commented on KAFKA-5886:
-

The excellent KIP writeup is courtesy [~jkoshy] 

> Introduce delivery.timeout.ms producer config (KIP-91)
> --
>
> Key: KAFKA-5886
> URL: https://issues.apache.org/jira/browse/KAFKA-5886
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Sumant Tambe
>Assignee: Sumant Tambe
> Fix For: 1.0.0
>
>
> We propose adding a new timeout delivery.timeout.ms. The window of 
> enforcement includes batching in the accumulator, retries, and the inflight 
> segments of the batch. With this config, the user has a guaranteed upper 
> bound on when a record will either get sent, fail or expire from the point 
> when send returns. In other words we no longer overload request.timeout.ms to 
> act as a weak proxy for accumulator timeout and instead introduce an explicit 
> timeout that users can rely on without exposing any internals of the producer 
> such as the accumulator. 
> See 
> [KIP-91|https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer]
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5886) Introduce delivery.timeout.ms producer config (KIP-91)

2017-10-02 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188300#comment-16188300
 ] 

Sumant Tambe commented on KAFKA-5886:
-

[~guozhang] I'm resuming this today. I'll keep you posted. 

[~twbecker] With this KIP you can configure `delivery.timeout.ms` to a very 
large value and reduce `request.timeout.ms` to a much smaller value. As default 
`retries` are MAX_LONG, it should help your situation. What say?

> Introduce delivery.timeout.ms producer config (KIP-91)
> --
>
> Key: KAFKA-5886
> URL: https://issues.apache.org/jira/browse/KAFKA-5886
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Sumant Tambe
>Assignee: Sumant Tambe
> Fix For: 1.0.0
>
>
> We propose adding a new timeout delivery.timeout.ms. The window of 
> enforcement includes batching in the accumulator, retries, and the inflight 
> segments of the batch. With this config, the user has a guaranteed upper 
> bound on when a record will either get sent, fail or expire from the point 
> when send returns. In other words we no longer overload request.timeout.ms to 
> act as a weak proxy for accumulator timeout and instead introduce an explicit 
> timeout that users can rely on without exposing any internals of the producer 
> such as the accumulator. 
> See 
> [KIP-91|https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer]
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5886) Introduce delivery.timeout.ms producer config (KIP-91)

2017-09-29 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186378#comment-16186378
 ] 

Sumant Tambe commented on KAFKA-5886:
-

I think It's unlikely that this would merge in 1.0.0. It's a complex patch. 
There are a few issues regarding tests that I've not had a chance to resolve. I 
plan to resume work next week.

I'm familiar with the heavy producer load case. That's how we ran into the 
expiry issue. We've an internal patched version to work around that issue.

> Introduce delivery.timeout.ms producer config (KIP-91)
> --
>
> Key: KAFKA-5886
> URL: https://issues.apache.org/jira/browse/KAFKA-5886
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Sumant Tambe
>Assignee: Sumant Tambe
> Fix For: 1.0.0
>
>
> We propose adding a new timeout delivery.timeout.ms. The window of 
> enforcement includes batching in the accumulator, retries, and the inflight 
> segments of the batch. With this config, the user has a guaranteed upper 
> bound on when a record will either get sent, fail or expire from the point 
> when send returns. In other words we no longer overload request.timeout.ms to 
> act as a weak proxy for accumulator timeout and instead introduce an explicit 
> timeout that users can rely on without exposing any internals of the producer 
> such as the accumulator. 
> See 
> [KIP-91|https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer]
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-5886) Implement KIP-91

2017-09-13 Thread Sumant Tambe (JIRA)
Sumant Tambe created KAFKA-5886:
---

 Summary: Implement KIP-91
 Key: KAFKA-5886
 URL: https://issues.apache.org/jira/browse/KAFKA-5886
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Reporter: Sumant Tambe
Assignee: Sumant Tambe
 Fix For: 1.0.0


Implement 
[KIP-91|https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5621) The producer should retry expired batches when retries are enabled

2017-08-03 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113590#comment-16113590
 ] 

Sumant Tambe commented on KAFKA-5621:
-

Thread closed. Please share your thoughts on KIP-91 [DISCUSS] thread instead.

> The producer should retry expired batches when retries are enabled
> --
>
> Key: KAFKA-5621
> URL: https://issues.apache.org/jira/browse/KAFKA-5621
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
> Fix For: 1.0.0
>
>
> Today, when a batch is expired in the accumulator, a {{TimeoutException}} is 
> raised to the user.
> It might be better the producer to retry the expired batch rather up to the 
> configured number of retries. This is more intuitive from the user's point of 
> view. 
> Further the proposed behavior makes it easier for applications like mirror 
> maker to provide ordering guarantees even when batches expire. Today, they 
> would resend the expired batch and it would get added to the back of the 
> queue, causing the output ordering to be different from the input ordering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-5621) The producer should retry expired batches when retries are enabled

2017-08-02 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111237#comment-16111237
 ] 

Sumant Tambe edited comment on KAFKA-5621 at 8/2/17 4:38 PM:
-

Wait, what? blocking indefinitely? where? No No.

On one hand, I agree with [~apurva]'s point about no exposing the batching 
aspects in even more configs,
 {{message.max.delivery.wait.ms}} could be confusing because it would be 
interpreted as an end-to-end timeout. From batch-ready to broker receiving it. 
Or worse, from calling send to broker receiving it. An abstract name like that 
is open to interpretation. Therefore, I personally prefer {{batch.expiry.ms}} 
as it does what it says. It expires batches (in the accumulator). IMO, batching 
a well-known concept in big-data infrastructure: Kafka, Spark to name a few. 

So lets assume we chose a name X for the accumulator timeout. Now the question 
is what should be the defaults for it. We can either 1. support 
backwards-compatible behavior by setting it X=request.timeout.ms or 2. Some 
other agreed upon value (e.g., retries * request.timeout.ms) or 3. MAX_LONG in 
the interest of letting Kafka do the work to the maximum possible extent.

I'm ok with 1 and 2. The concern I've about #3 is that "maximum extent" is not 
forever. It's literally an extreme. 

Following is an attempt to answer [~ijuma]'s question.

There are at-least three classes of application I consider when I think about 
producer timeouts. 1. real-time apps (periodic healthcheck producer, 
temperature sensor producer) 2. fail-fast (e.g., KMM) 3. Best Effort (I.e., 
everything else. Further sub-categorization possible here). 

A real-time app  has a soft upper bound on *both* message delivery and failure 
of message delivery. In both cases, it wants to know. Such as app does *not* 
close the producer on the first error because there's more data lined up right 
behind. It's ok to lose a few samples of temperature. So it simply drops it and 
moves on. May be when drop rate is like 70% it would close it. May use acks=0. 
In this case, X=MAX_LONG is not suitable.

A fail-fast app does not have a upper bound on when the data gets to the 
broker. It's ready to wait for a long time to give every message a chance. But, 
if it's not making progress on a partition, it needs to know asap. That's it 
has a bound on the failure notification. Such an app (IMO) would a. close the 
producer if ordering is important (that's how we run KMM in Linkedin) b. queue 
it at the back of the queue for a later attempt with obvious reordering. In 
both cases, X=MAX_LONG is not suitable.

Now comes Best Effort, it has no bounds on success or failure notification. 
Frankly, I don't know which app really fits this bill. I've heard about apps 
configuring retries=MAX_LONG. I'll just say that I agree with [~becket_qin]'s 
opinion on this that it's considered bad. 

+1 for announcing Kip-91. I'll try to capture the discussion in this thread in 
the KIP before announcing. Do you guys want me to do that? Or should I just 
link to this thread?


was (Author: sutambe):
Wait, what? blocking indefinitely? where? No No.

On one hand, I agree with [~apurva]'s point about no exposing the batching 
aspects in even more configs,
 {{message.max.delivery.wait.ms}} could be confusing because it would be 
interpreted as an end-to-end timeout. From batch-ready to broker receiving it. 
Or worse, from calling send to broker receiving it. An abstract name like that 
is open to interpretation. Therefore, I personally prefer {{batch.expiry.ms}} 
as it does what it says. It expires batches (in the accumulator). IMO, batching 
a well-known concept in big-data infrastructure: Kafka, Spark to name a few. 

So lets assume we chose a name X for the accumulator timeout. Now the question 
is what should be the defaults for it. We can either 1. support 
backwards-compatible behavior by setting it X=request.timeout.ms or 2. Some 
other agreed upon value (e.g., retries * request.timeout.ms) or 3. MAX_LONG in 
the interest of letting Kafka do the work to the maximum possible extent.

I'm ok with 1 and 2. The concern I've about #3 is that "maximum extent" is not 
forever. It's literally an extreme. 

Following is an attempt to answer [~ijuma]'s question.

There are at-least three classes of application I consider when I think about 
producer timeouts. 1. real-time apps (periodic healthcheck producer, 
temperature sensor producer) 2. fail-fast (e.g., KMM) 3. Best Effort (I.e., 
everything else. Further sub-categorization possible here). 

A real-time app  has a soft upper bound on *both* message delivery and failure 
of message delivery. In both cases, it wants to know. Such as app does *not* 
close the producer on the first error because there's more data lined up right 
behind. It's ok to lose a few samples of temperature. So it simply drops it and 
moves on. May be when drop 

[jira] [Commented] (KAFKA-5621) The producer should retry expired batches when retries are enabled

2017-07-27 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103460#comment-16103460
 ] 

Sumant Tambe commented on KAFKA-5621:
-

{{queue.time.ms}} sound a lot like {{linger.ms}} to me. It may be confusing for 
those who are new to Kafka. I naturally biased towards {{batch.expiry.ms}} 
which  proposed in KIP-91. For backwards compatibility, the default should be 
equal to {{request.timeout.ms}}, I think. 

> The producer should retry expired batches when retries are enabled
> --
>
> Key: KAFKA-5621
> URL: https://issues.apache.org/jira/browse/KAFKA-5621
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
> Fix For: 1.0.0
>
>
> Today, when a batch is expired in the accumulator, a {{TimeoutException}} is 
> raised to the user.
> It might be better the producer to retry the expired batch rather up to the 
> configured number of retries. This is more intuitive from the user's point of 
> view. 
> Further the proposed behavior makes it easier for applications like mirror 
> maker to provide ordering guarantees even when batches expire. Today, they 
> would resend the expired batch and it would get added to the back of the 
> queue, causing the output ordering to be different from the input ordering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5621) The producer should retry expired batches when retries are enabled

2017-07-25 Thread Sumant Tambe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100349#comment-16100349
 ] 

Sumant Tambe commented on KAFKA-5621:
-

[~ijuma], [~apurva], It looks like this proposal will change the notional 
accumulator timeout from request.timeout.ms (r.t.m) to  (r.t.m * retries). The 
bound on the send path of a record from the point when send returns is up to 
linger.ms + r.t.m * retries + retries * (r.t.m. + retry.backoff.ms). So time of 
the send failure notification could potentially double as two full retry cycles 
will be applied. 

I prefer an explicit name to the accumulator timeout (we proposed 
batch.expiry.ms) but I'm well aware that adding new configs makes things harder 
for the enduser. I prefer to avoid overloading configs to do multiple jobs. 
That's confusing too. Applying retries to accumulator is not intuitive to me. 
It seems to jack up accumulator time by a multiple. 

Having said that, if there is strong agreement over not adding a new config, 
I'm fine with the proposed trick here.

> The producer should retry expired batches when retries are enabled
> --
>
> Key: KAFKA-5621
> URL: https://issues.apache.org/jira/browse/KAFKA-5621
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
> Fix For: 1.0.0
>
>
> Today, when a batch is expired in the accumulator, a {{TimeoutException}} is 
> raised to the user.
> It might be better the producer to retry the expired batch rather up to the 
> configured number of retries. This is more intuitive from the user's point of 
> view. 
> Further the proposed behavior makes it easier for applications like mirror 
> maker to provide ordering guarantees even when batches expire. Today, they 
> would resend the expired batch and it would get added to the back of the 
> queue, causing the output ordering to be different from the input ordering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)