[jira] [Comment Edited] (KAFKA-1911) Log deletion on stopping replicas should be async

Joel Koshy (JIRA) Tue, 22 Sep 2015 10:35:57 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903046#comment-14903046
 ]


Joel Koshy edited comment on KAFKA-1911 at 9/22/15 5:34 PM:
------------------------------------------------------------

The original motivation in this ticket was to avoid a high latency request from 
tying up request handlers. However, while thinking through some nuances of 
delete topic, I think delete topic would also benefit from this. Since 
stop-replica-requests can take a while to finish delete topic can also take a 
while (apart from failure cases such as a replica being down).

I think the easiest way to fix this would be to just rename the partition 
directory from <topic><partId> to something like <topic><partId>deleted<seqNo> 
and asynchronously delete that. The <seqNo> is probably needed if a user were 
to delete and recreate multiple times in rapid fire for whatever reason.


was (Author: jjkoshy):
The original motivation in this ticket was to avoid a high latency request from 
tying up request handlers. However, while thinking through some nuances of 
delete topic, I think delete topic would also benefit from this. Since 
stop-replica-requests can take a while to finish delete topic can also take a 
while (apart from failure cases such as a replica being down).

I think the easiest way to fix this would be to just rename the partition 
directory from <topic>-<partId> to something like 
<topic>-<partId>-deleted-<seqNo> and asynchronously delete that. The <seqNo> is 
probably needed if a user were to delete and recreate multiple times in rapid 
fire for whatever reason.

> Log deletion on stopping replicas should be async
> -------------------------------------------------
>
>                 Key: KAFKA-1911
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1911
>             Project: Kafka
>          Issue Type: Bug
>          Components: log, replication
>            Reporter: Joel Koshy
>            Assignee: Geoff Anderson
>              Labels: newbie++
>
> If a StopReplicaRequest sets delete=true then we do a file.delete on the file 
> message sets. I was under the impression that this is fast but it does not 
> seem to be the case.
> On a partition reassignment in our cluster the local time for stop replica 
> took nearly 30 seconds.
> {noformat}
> Completed request:Name: StopReplicaRequest; Version: 0; CorrelationId: 467; 
> ClientId: ;    DeletePartitions: true; ControllerId: 1212; ControllerEpoch: 
> 53 from 
> client/...:45964;totalTime:29191,requestQueueTime:1,localTime:29190,remoteTime:0,responseQueueTime:0,sendTime:0
> {noformat}
> This ties up one API thread for the duration of the request.
> Specifically in our case, the queue times for other requests also went up and 
> producers to the partition that was just deleted on the old leader took a 
> while to refresh their metadata (see KAFKA-1303) and eventually ran out of 
> retries on some messages leading to data loss.
> I think the log deletion in this case should be fully asynchronous although 
> we need to handle the case when a broker may respond immediately to the 
> stop-replica-request but then go down after deleting only some of the log 
> segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-1911) Log deletion on stopping replicas should be async

Reply via email to