[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2014-09-02 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118925#comment-14118925
 ] 

Guozhang Wang commented on KAFKA-998:
-

This ticket can be closed as won't fix since we are moving to new producer now.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-26 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779636#comment-13779636
 ] 

Jun Rao commented on KAFKA-998:
---

My feeling is that it may not be very easy to do a quick fix. Currently, the 
cause exceptions are eaten at several places just so that we can pass back 
unsuccessfully sent messages.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-26 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778974#comment-13778974
 ] 

Jason Rosenberg commented on KAFKA-998:
---

[~junrao]
Would it be an easier short term fix, to at least include the root cause set on 
the FailedToSendMessageException.  So, we could see the 
MessageSizeTooLargeException as the cause of the FTSME?  Or is that not easy to 
do?

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-26 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778894#comment-13778894
 ] 

Jason Rosenberg commented on KAFKA-998:
---

Jun,

Yeah, sorry, I forgot I had filed KAFKA-1025 to address my concerns about 
exposing recoverability.

Thanks,

Jason

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-26 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778883#comment-13778883
 ] 

Jun Rao commented on KAFKA-998:
---

Jason,

This patch just won't retry sending the data when hitting a 
MessageTooLargeException. It doesn't really address you main concern, which is 
the caller doesn't know the real cause of the failure. Addressing this issue 
completely will need some more thoughts in the producer logic and the changes 
required may be non-trivial. So, I am not sure if we should do this in 0.8. 

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-26 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778822#comment-13778822
 ] 

Jason Rosenberg commented on KAFKA-998:
---

[~fancyrao]

I'd love to see this in 0.8.  For a message too large exception, which gets 
returned to the producer client currently as a FailedToSendMessageException, 
it's indistinguishable from any other kind of exception, for which sub-dividing 
the batch and retrying are not viable options.

The workaround described is not workable, in practice since the FTSME does not 
include any root cause information (a simple causedBy() method might help in 
that regard)!

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-12 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765633#comment-13765633
 ] 

Jun Rao commented on KAFKA-998:
---

Thanks for the patch. Looks good overall. Just one comment:

10. ErrorMapping.fatalException(): Should we rename it to 
unrecoverableException? MessageSizeTooLarge doesn't seems like a fatal 
exception. 

I am not sure if it's worth patching this in 0.8. The workaround is to reduce 
the batch size, as well reducing retry times and retry intervals.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-07 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761100#comment-13761100
 ] 

Neha Narkhede commented on KAFKA-998:
-

>> Do people think this is small and important enough to apply to 0.8?

+1.

Guozhang, do you mind submitting a reviewboard ?

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-05 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759257#comment-13759257
 ] 

Joel Koshy commented on KAFKA-998:
--

Oh I thought this was for 0.8 - it does apply on trunk.  Do people think this 
is small and important enough to apply to 0.8?


Another comment after thinking about 
the patch: in dispatchSerializedData - would it be better to just drop data 
that have hit the message size limit?  That way, there is no need to return the 
needRetry, so the dispatchSerializedData signature remains the same. The 
disadvantage is that we won't propagage a failedtosendmessage exception for 
such messages to the caller - for the producer in async mode that is probably 
fine (since right now the caller can't really do much with that exception) - in 
sync mode the caller could perhaps decide to send fewer messages at once. Even 
in that case we don't really say which topics/messages hit the message size 
limit so I think it is fine in that case as well. Furthermore, this would be 
covered by KAFKA-1026 to a large degree.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-04 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758659#comment-13758659
 ] 

Guozhang Wang commented on KAFKA-998:
-

Thanks for the patch Joel. Do you mean rebase on 0.8 (it was originally on 
trunk)?

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-09-04 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758450#comment-13758450
 ] 

Joel Koshy commented on KAFKA-998:
--

Apologies for the late review. Couple of comments:
* I think this could reset needRetry back to false if subsequent partitions in 
the iteration do need a retry: needRetry = needRetry && 
!fatalException(topicPartitionAndError._2). The logic is actually a bit 
confusing. Instead, it might be clearer to just do: 
failedTopicPartitions.exists()
* Can you enhance the logging a bit to indicate that there were fatal sends 
that will not be retried? e.g., "Dropping messages to topic x due to message 
size limit.." or something like that.
* Can you rebase?


> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-08-26 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750847#comment-13750847
 ] 

Jason Rosenberg commented on KAFKA-998:
---

Ok, I filed KAFKA-1025 to track the issue for reasoning about whether a failed 
send should be recoverable.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-08-26 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750789#comment-13750789
 ] 

Guozhang Wang commented on KAFKA-998:
-

Hello Jason,

I think what you need is to return more information to the caller of 
Producer.send(), since currently it only returns a FailedToSendMessageException:

"Failed to send messages after #, tries."

For this case I think it is better for you to create a separate JIRA. As for 
dynamically adjust the batch size upon receiving MessageSizeTooLargeException, 
I will file a separate JIRA for this.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-08-26 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750676#comment-13750676
 ] 

Jason Rosenberg commented on KAFKA-998:
---

Also, a producer can throw QueueFullException.  From the client's point of 
view, it would make sense that this should also be a retryable situation 
(depending on load).  Thus, QueueFullException might make sense to be a 
sub-class of FailedToSendException (and more likely a sub-class of 
RetriesExhaustedFailedToSendException (or whatever name that might better be 
renamed to).

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-08-26 Thread Jason Rosenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749876#comment-13749876
 ] 

Jason Rosenberg commented on KAFKA-998:
---

The caller of Producer.send() should also have the ability to know whether a 
send failure is recoverable (that might succeed with more retries). It may be 
hard for the client developer to guess the right number of 
message.send.max.retries, otherwise (since a transient error, like a restarting 
broker, could take an unknown amount of time).  If I want to implement 
guaranteed semantics, then the client needs to be able to have information on 
whether to continue retrying a message, or else give up.

This could be done by having Producer.send() throw different exception types 
(e.g. different versions of FailedToSendMessageException), e.g. 
UnrecoverableFailedToSendException or RetriesExhaustedFailedToSendException 
(perhaps shorter names for these exceptions).  These could both be sub-classes 
of FailedToSendException.

Another approach might be to have the FailedToSendException return information, 
such as how many retries were attempted, whether or not the message might be 
recoverable with more retries, and it should wrap the root cause, so debugging 
is possible.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
> Attachments: KAFKA-998.v1.patch
>
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (KAFKA-998) Producer should not retry on non-recoverable error codes

2013-08-05 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729681#comment-13729681
 ] 

Guozhang Wang commented on KAFKA-998:
-

Approach proposal:

1. Passing the errorCode from send to dispatchSerializedData to handle along 
with the outstandingProduceRequests.

2. When outstandingProduceRequests.size > 0, check the corresponding error 
code, if the error code indicates a non-avoidable error such as 
MessageSizeTooLarge, then break the while loop intermediately.

By doing so it would also be easier in the future if we want to make a smarted 
producer client by dynamically shrinking the batch size.

> Producer should not retry on non-recoverable error codes
> 
>
> Key: KAFKA-998
> URL: https://issues.apache.org/jira/browse/KAFKA-998
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8, 0.8.1
>Reporter: Joel Koshy
>Assignee: Guozhang Wang
>
> Based on a discussion with Guozhang. The producer currently retries on all 
> error codes (including messagesizetoolarge which is pointless to retry on). 
> This can slow down the producer unnecessarily.
> If at all we want to retry on that error code we would need to retry with a 
> smaller batch size, but that's a separate discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira