date:20200903

See 


Changes:

[github] KAFKA-10355: Throw error when source topic was deleted (#9191)

[github] MINOR: Record all poll invocations (#9234)


--
[...truncated 6.56 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord

Build failed in Jenkins: Kafka » kafka-trunk-jdk8 #43

See 


Changes:

[github] KAFKA-10355: Throw error when source topic was deleted (#9191)

[github] MINOR: Record all poll invocations (#9234)


--
[...truncated 6.50 MB...]

org.apache.kafka.streams.TopologyTestDriverTest > shouldInitProcessor[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowForUnknownTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowForUnknownTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureGlobalTopicNameIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureGlobalTopicNameIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

Re: [DISCUSS] KIP-406: GlobalStreamThread should honor custom reset policy

Thanks Matthias, that sounds like what I was thinking. I think we should
always be
able to figure out what to do in various scenarios as outlined in the
previous email.

>  For the same reason, I wouldn't want to combine global restoring and
normal restoring
> because then it would make all the restorings independent but we don't
want that. We
> want global stores to be available before any processing starts on the
active tasks.

I'm not sure I follow this specific point, but I don't think I did a good
job of explaining my
proposal so it's probably my own fault. When I say that we should merge
RESTORING
and GLOBAL_RESTORING, I just mean that we should provide a single
user-facing
state to encompass any ongoing restoration. The point of the KafkaStreams
RESTORING
state is to alert users that their state may be unavailable for IQ, and
active tasks may be
idle. This is true for both global and non-global restoration. I think the
ultimate question
is whether as a user, I would react any differently to a GLOBAL_RESTORING
state vs
the regular RESTORING. My take is "no", in which case we should just
provide a single
unified state for the minimal public API. But if anyone can think of a
reason for the user
to need to distinguish between different types of restoration, that would
be a good
argument to keep them separate.

Internally, we do need to keep track of a "global restore" flag to
determine the course
of action -- for example if a StreamThread transitions to RUNNING but sees
that the
KafkaStreams state is RESTORING, should it start processing or not? The
answer
depends on whether the state is RESTORING due to any global stores. But the
KafkaStreams state is a public interface, not an internal bookkeeper, so we
shouldn't
try to push our internal logic into the user-facing API.

On Thu, Sep 3, 2020 at 7:36 AM Matthias J. Sax  wrote:

> I think this issue can actually be resolved.
>
>  - We need a flag on the stream-threads if global-restore is in
> progress; for this case, the stream-thread may go into RUNNING state,
> but it's not allowed to actually process data -- it will be allowed to
> update standby-task though.
>
>  - If a stream-thread restores, its own state is RESTORING and it does
> not need to care about the "global restore flag".
>
>  - The global-thread just does was we discussed, including using state
> RESTORING.
>
>  - The KafkaStreams client state is in RESTORING, if at least one thread
> (stream-thread or global-thread) is in state RESTORING.
>
>  - On startup, if there is a global-thread, the just set the
> global-restore flag upfront before we start the stream-threads (we can
> actually still do the rebalance and potential restore in stream-thread
> in parallel to global restore) and rely on the global-thread to unset
> the flag.
>
>  - The tricky thing is, to "stop" processing in stream-threads if we
> need to wipe the global-store and rebuilt it. For this, we should set
> the "global restore flag" on the stream-threads, but we also need to
> "lock down" the global store in question and throw an exception if the
> stream-thread tries to access it; if the stream-thread get this
> exception, it need to cleanup itself, and wait until the "global restore
> flag" is unset before it can continue.
>
>
> Do we think this would work? -- Of course, the devil is in the details
> but it seems to become a PR discussion, and there is no reason to make
> it part of the KIP.
>
>
> -Matthias
>
> On 9/3/20 3:41 AM, Navinder Brar wrote:
> > Hi,
> >
> > Thanks, John, Matthias and Sophie for great feedback.
> >
> > On the point raised by Sophie that maybe we should allow normal
> restoring during GLOBAL_RESTORING, I think it makes sense but the challenge
> would be what happens when normal restoring(on actives) has finished but
> GLOBAL_RESTORINGis still going on. Currently, all restorings are
> independent of each other i.e. restoring happening on one task/thread
> doesn't affect another. But if we do go ahead with allowing normal
> restoring during GLOBAL_RESTORING then we willstill have to pause the
> active tasks from going to RUNNING if GLOBAL_RESTORING has not finished and
> normal restorings have finished. For the same reason, I wouldn't want to
> combine global restoring and normal restoring because then it would make
> all the restorings independent but we don't want that. We want global
> stores to be available before any processing starts on the active tasks.
> >
> > Although I think restoring of replicas can still take place while global
> stores arerestoring because in replicas there is no danger of them starting
> processing.
> >
> > Also, one point to bring up is that currently during application startup
> global stores restore first and then normal stream threads start.
> >
> > Regards,Navinder
> >
> > On Thursday, 3 September, 2020, 06:58:40 am IST, Matthias J. Sax <
> mj...@apache.org> wrote:
> >
> >  Thanks for the input Sophie. Those are all good points and I fully agree
> > with

Build failed in Jenkins: Kafka » kafka-trunk-jdk11 #45

See 


Changes:

[github] KAFKA-10355: Throw error when source topic was deleted (#9191)

[github] MINOR: Record all poll invocations (#9234)


--
[...truncated 3.29 MB...]

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName PASSED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics STARTED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver PASSED

org.apache.kafka.streams.TestTopicsTest > testValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testValueList PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordList PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testInputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testInputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders STARTED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValue STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValue PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
PASSED

> Task :streams:upgrade-system-tests-0100:processTestResources NO-SOURCE
> Task :streams:upgrade-system-tests-0100:testClasses
> Task :streams:upgrade-system-tests-0100:checkstyleTest
> Task :streams:upgrade-system-tests-0100:spotbugsMain NO-SOURCE
> Task :streams:upgrade-system-tests-0100:test
> Task :streams:upgrade-system-tests-0101:compileJava NO-SOURCE
> Task :streams:upgrade-system-tests-0101:processResources NO-SOURCE
> Task :streams:upgrade-system-tests-0101:classes UP-TO-DATE
> Task :streams:upgrade-system-tests-0101:checkstyleMain NO-SOURCE
> Task :streams:upgrade-system-tests-0101:compileTestJava
> Task

[VOTE] KIP-654 Aborting Transaction with non-flushed data should throw a non-fatal Exception


Hi,

I would like to call a vote on the following KIP.

*KIP *- 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-654:+Aborted+transaction+with+non-flushed+data+should+throw+a+non-fatal+exception 



TL;DR: This KIP proposes to throw a new, non-fatal exception whilst 
aborting transactions with non-flushed data. This will help users 
distinguish non-fatal errors and potentially retry the batch.


Thanks,
-Gokul

Re: [DISCUSS] KIP-654 Aborting Transaction with non-flushed data should throw a non-fatal Exception


Appreciate the help!

On 04-09-2020 00:46, Sophie Blee-Goldman wrote:

Yep, you can go ahead and call for a vote on the KIP

On Thu, Sep 3, 2020 at 12:09 PM Gokul Srinivas  wrote:

Sophie,

That sounds fair. I've updated the KIP to state the same message for
backward compatibility to existing (albeit hacky) solutions.

As this is my first ever contribution - is the next step to
initiate the
voting on this KIP?

-Gokul

On 04-09-2020 00:34, Sophie Blee-Goldman wrote:
> I think the current proposal looks good to me. One minor
suggestion I have
> is to consider keeping the same error message:
>
> Failing batch since transaction was aborted
>
>
> When we were running into this issue in Streams and accidentally
rethrowing
> the KafkaException as fatal, we ended up checking the specific
error message
> of the KafkaException and swallowing the exception if it was
equivalent to
> the
> above. Obviously this was pretty hacky (hence the motivation for
this KIP)
> and
> luckily we found a way around this, but it makes me wonder if any
> applications
> out there might be doing the same. So maybe we should reuse the
old error
> message just in case?
>
> Besides that, this KIP LGTM
>
> On Thu, Sep 3, 2020 at 5:23 AM Gokul Srinivas
 wrote:
>
>> All,
>>
>> Gentle reminder - any comments on the line of thinking I
mentioned in
>> the last email? I've updated the Exception to be named
>> "TransactionAbortedException" on the KIP confluence page.
>>
>> -Gokul
>>
>> On 01-09-2020 18:34, Gokul Srinivas wrote:
>>> Matthias, Sophie, Jason,
>>>
>>> Took another pass at understanding the internals and it seems
to me
>>> like we should be extending the `ApiException` rather than the
>>> `RetriableException`.
>>>
>>> The check in question
>>> =
>>>
>>> Do we abort any undrained batches that are present on this
transaction
>>> if the transaction is in an aborting state? And, if we do,
what would
>>> be the reason sent back to the user for aborting these batches?
>>>
>>> Logic for this
>>> ==
>>>
>>> If the transaction `isAborting` and `hasAbortableError` and the
>>> `lastError()` is not empty -> then there has been some error which
>>> will cause / has caused the transaction to abort and this *is* a
>>> runtime exception. This same exception should be sent back to
the user.
>>>
>>> If the transaction `isAborting` and `lastError()` is empty ->
then for
>>> some unknown reason (maybe even user initiated, according to the
>>> tests), the transaction manager has started to abort the
transaction.
>>> In this case, the newly proposed exception should be sent back
to the
>>> user.
>>>
>>> Reasoning
>>> =
>>>
>>> Prima facie - I do not think this is a `RetriableException`.
>>>
>>> If the user has chosen to abort this transaction, then it
would be up
>>> to the user to choose whether to retry the exception, in which
case it
>>> is /*not*/ a `RetriableException`.
>>>
>>> If there is a case where the transaction manager has no error,
but has
>>> started to abort the exception, we still do not retry the
transaction,
>>> rather we abort any undrained batches - in which case, it is
/*still
>>> not*/ a `RetriableException`.
>>>
>>> Does that sound right?
>>>
>>> -Gokul
>>>
>>> On 29-08-2020 01:17, Jason Gustafson wrote:
 Hi Gokul,

 Thanks, I think it makes sense to use a separate exception
type. +1 on
 Sophie's suggestion for `TransactionAbortedException`.

 Extending from `RetriableException` seems reasonable as well.
I guess
 the
 only question is whether it's safe to catch it as a
`RetriableException`
 and apply common retry logic. For a transactional producer, my
 expectation
 is that the application would abort the transaction and retry it.
 However,
 if the transaction is already being aborted, maybe it would
be better to
 skip the abort. It might be helpful to have an example which
shows
 how we
 expect applications to handle this.

 Thanks,
 Jason






 On Thu, Aug 27, 2020 at 6:25 PM Sophie Blee-Goldman
 mailto:sop...@confluent.io>>
 wrote:

> Hey Gokul, thanks for taking up this KIP!
>
> I agree with Matthias that directly extending KafkaException
may not be
> ideal,
> and we should instead extend APIException or
RetriableException. Of the
> two,
> I think APIException would be

Re: [DISCUSS] KIP-654 Aborting Transaction with non-flushed data should throw a non-fatal Exception

Yep, you can go ahead and call for a vote on the KIP

On Thu, Sep 3, 2020 at 12:09 PM Gokul Srinivas  wrote:

> Sophie,
>
> That sounds fair. I've updated the KIP to state the same message for
> backward compatibility to existing (albeit hacky) solutions.
>
> As this is my first ever contribution - is the next step to initiate the
> voting on this KIP?
>
> -Gokul
>
> On 04-09-2020 00:34, Sophie Blee-Goldman wrote:
> > I think the current proposal looks good to me. One minor suggestion I
> have
> > is to consider keeping the same error message:
> >
> > Failing batch since transaction was aborted
> >
> >
> > When we were running into this issue in Streams and accidentally
> rethrowing
> > the KafkaException as fatal, we ended up checking the specific error
> message
> > of the KafkaException and swallowing the exception if it was equivalent
> to
> > the
> > above. Obviously this was pretty hacky (hence the motivation for this
> KIP)
> > and
> > luckily we found a way around this, but it makes me wonder if any
> > applications
> > out there might be doing the same. So maybe we should reuse the old error
> > message just in case?
> >
> > Besides that, this KIP LGTM
> >
> > On Thu, Sep 3, 2020 at 5:23 AM Gokul Srinivas  wrote:
> >
> >> All,
> >>
> >> Gentle reminder - any comments on the line of thinking I mentioned in
> >> the last email? I've updated the Exception to be named
> >> "TransactionAbortedException" on the KIP confluence page.
> >>
> >> -Gokul
> >>
> >> On 01-09-2020 18:34, Gokul Srinivas wrote:
> >>> Matthias, Sophie, Jason,
> >>>
> >>> Took another pass at understanding the internals and it seems to me
> >>> like we should be extending the `ApiException` rather than the
> >>> `RetriableException`.
> >>>
> >>> The check in question
> >>> =
> >>>
> >>> Do we abort any undrained batches that are present on this transaction
> >>> if the transaction is in an aborting state? And, if we do, what would
> >>> be the reason sent back to the user for aborting these batches?
> >>>
> >>> Logic for this
> >>> ==
> >>>
> >>> If the transaction `isAborting` and `hasAbortableError` and the
> >>> `lastError()` is not empty -> then there has been some error which
> >>> will cause / has caused the transaction to abort and this *is* a
> >>> runtime exception. This same exception should be sent back to the user.
> >>>
> >>> If the transaction `isAborting` and `lastError()` is empty -> then for
> >>> some unknown reason (maybe even user initiated, according to the
> >>> tests), the transaction manager has started to abort the transaction.
> >>> In this case, the newly proposed exception should be sent back to the
> >>> user.
> >>>
> >>> Reasoning
> >>> =
> >>>
> >>> Prima facie - I do not think this is a `RetriableException`.
> >>>
> >>> If the user has chosen to abort this transaction, then it would be up
> >>> to the user to choose whether to retry the exception, in which case it
> >>> is /*not*/ a `RetriableException`.
> >>>
> >>> If there is a case where the transaction manager has no error, but has
> >>> started to abort the exception, we still do not retry the transaction,
> >>> rather we abort any undrained batches - in which case, it is /*still
> >>> not*/ a `RetriableException`.
> >>>
> >>> Does that sound right?
> >>>
> >>> -Gokul
> >>>
> >>> On 29-08-2020 01:17, Jason Gustafson wrote:
>  Hi Gokul,
> 
>  Thanks, I think it makes sense to use a separate exception type. +1 on
>  Sophie's suggestion for `TransactionAbortedException`.
> 
>  Extending from `RetriableException` seems reasonable as well. I guess
>  the
>  only question is whether it's safe to catch it as a
> `RetriableException`
>  and apply common retry logic. For a transactional producer, my
>  expectation
>  is that the application would abort the transaction and retry it.
>  However,
>  if the transaction is already being aborted, maybe it would be better
> to
>  skip the abort. It might be helpful to have an example which shows
>  how we
>  expect applications to handle this.
> 
>  Thanks,
>  Jason
> 
> 
> 
> 
> 
> 
>  On Thu, Aug 27, 2020 at 6:25 PM Sophie Blee-Goldman
>  
>  wrote:
> 
> > Hey Gokul, thanks for taking up this KIP!
> >
> > I agree with Matthias that directly extending KafkaException may not
> be
> > ideal,
> > and we should instead extend APIException or RetriableException. Of
> the
> > two,
> > I think APIException would be more appropriate. My understanding is
> > that
> > RetriableException is generally reserved for internally retriable
> > exceptions
> > whereas APIException is used for pseudo-fatal exceptions that
> > require some
> > user input as to how to proceed (eg ProducerFencedException)
> >
> > I also agree that the name could be a bit more concise. My personal
> > vote
> > would be for

Re: [VOTE] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

Hey, sorry for the late reply, I just have one minor suggestion. Since we
don't
make any guarantees about which thread gets removed or allow the user to
specify, I think we should return either the index or full name of the
thread
that does get removed by removeThread().

I know you just updated the KIP to return true/false if there are/aren't any
threads to be removed, but I think this would be more appropriate as an
exception than as a return type. I think it's reasonable to expect users to
have some sense to how many threads are remaining, and not try to remove
a thread when there is none left. To me, that indicates something wrong
with the user application code and should be treated as an exceptional case.
I don't think the same code clarify argument applies here as to the
addStreamThread() case, as there's no reason for an application to be
looping and retrying removeStreamThread()  since if that fails, it's because
there are no threads left and thus it will continue to always fail. And if
the
user actually wants to shut down all threads, they should just close the
whole application rather than call removeStreamThread() in a loop.

While I generally think it should be straightforward for users to track how
many stream threads they have running, maybe it would be nice to add
a small utility method that does this for them. Something like

// Returns the number of currently alive threads
boolean runningStreamThreads();

On Thu, Sep 3, 2020 at 7:41 AM Matthias J. Sax  wrote:

> +1 (binding)
>
> On 9/3/20 6:16 AM, Bruno Cadonna wrote:
> > Hi,
> >
> > I would like to start the voting on KIP-663 that proposes to add methods
> > to the Kafka Streams client to add and remove stream threads during
> > execution.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads
> >
> >
> > Best,
> > Bruno
>
>

Re: [DISCUSS] KIP-654 Aborting Transaction with non-flushed data should throw a non-fatal Exception


Sophie,

That sounds fair. I've updated the KIP to state the same message for 
backward compatibility to existing (albeit hacky) solutions.


As this is my first ever contribution - is the next step to initiate the 
voting on this KIP?


-Gokul

On 04-09-2020 00:34, Sophie Blee-Goldman wrote:

I think the current proposal looks good to me. One minor suggestion I have
is to consider keeping the same error message:

Failing batch since transaction was aborted


When we were running into this issue in Streams and accidentally rethrowing
the KafkaException as fatal, we ended up checking the specific error message
of the KafkaException and swallowing the exception if it was equivalent to
the
above. Obviously this was pretty hacky (hence the motivation for this KIP)
and
luckily we found a way around this, but it makes me wonder if any
applications
out there might be doing the same. So maybe we should reuse the old error
message just in case?

Besides that, this KIP LGTM

On Thu, Sep 3, 2020 at 5:23 AM Gokul Srinivas  wrote:


All,

Gentle reminder - any comments on the line of thinking I mentioned in
the last email? I've updated the Exception to be named
"TransactionAbortedException" on the KIP confluence page.

-Gokul

On 01-09-2020 18:34, Gokul Srinivas wrote:

Matthias, Sophie, Jason,

Took another pass at understanding the internals and it seems to me
like we should be extending the `ApiException` rather than the
`RetriableException`.

The check in question
=

Do we abort any undrained batches that are present on this transaction
if the transaction is in an aborting state? And, if we do, what would
be the reason sent back to the user for aborting these batches?

Logic for this
==

If the transaction `isAborting` and `hasAbortableError` and the
`lastError()` is not empty -> then there has been some error which
will cause / has caused the transaction to abort and this *is* a
runtime exception. This same exception should be sent back to the user.

If the transaction `isAborting` and `lastError()` is empty -> then for
some unknown reason (maybe even user initiated, according to the
tests), the transaction manager has started to abort the transaction.
In this case, the newly proposed exception should be sent back to the
user.

Reasoning
=

Prima facie - I do not think this is a `RetriableException`.

If the user has chosen to abort this transaction, then it would be up
to the user to choose whether to retry the exception, in which case it
is /*not*/ a `RetriableException`.

If there is a case where the transaction manager has no error, but has
started to abort the exception, we still do not retry the transaction,
rather we abort any undrained batches - in which case, it is /*still
not*/ a `RetriableException`.

Does that sound right?

-Gokul

On 29-08-2020 01:17, Jason Gustafson wrote:

Hi Gokul,

Thanks, I think it makes sense to use a separate exception type. +1 on
Sophie's suggestion for `TransactionAbortedException`.

Extending from `RetriableException` seems reasonable as well. I guess
the
only question is whether it's safe to catch it as a `RetriableException`
and apply common retry logic. For a transactional producer, my
expectation
is that the application would abort the transaction and retry it.
However,
if the transaction is already being aborted, maybe it would be better to
skip the abort. It might be helpful to have an example which shows
how we
expect applications to handle this.

Thanks,
Jason






On Thu, Aug 27, 2020 at 6:25 PM Sophie Blee-Goldman

wrote:


Hey Gokul, thanks for taking up this KIP!

I agree with Matthias that directly extending KafkaException may not be
ideal,
and we should instead extend APIException or RetriableException. Of the
two,
I think APIException would be more appropriate. My understanding is
that
RetriableException is generally reserved for internally retriable
exceptions
whereas APIException is used for pseudo-fatal exceptions that
require some
user input as to how to proceed (eg ProducerFencedException)

I also agree that the name could be a bit more concise. My personal
vote
would be for "TransactionAbortedException" which seems a bit more
grammatically aligned with the other exceptions in Kafka.

Cheers,
Sophie

On Thu, Aug 27, 2020 at 6:01 PM Matthias J. Sax 
wrote:


Thanks for the KIP. Looks good overall.

However, I am wondering if the new exception should extend
`KafkaException`? It seems, extending `ApiException` or maybe even
`RetriableException` might be better?

About the name itself. I would prefer something simpler like
`AbortedTransactionException`.

Thoughts?


-Matthias


On 8/27/20 10:24 AM, Gokul Srinivas wrote:

Hello all,

I would like to propose the following KIP to throw a new non-fatal
exception whilst aborting transactions with non-flushed data. This
will
help users distinguish non-fatal errors and potentially retry the

batch.

*Issue *- https://issues.apache.org/jira/browse/KAFKA-10186

Re: [DISCUSS] KIP-654 Aborting Transaction with non-flushed data should throw a non-fatal Exception

I think the current proposal looks good to me. One minor suggestion I have
is to consider keeping the same error message:

Failing batch since transaction was aborted


When we were running into this issue in Streams and accidentally rethrowing
the KafkaException as fatal, we ended up checking the specific error message
of the KafkaException and swallowing the exception if it was equivalent to
the
above. Obviously this was pretty hacky (hence the motivation for this KIP)
and
luckily we found a way around this, but it makes me wonder if any
applications
out there might be doing the same. So maybe we should reuse the old error
message just in case?

Besides that, this KIP LGTM

On Thu, Sep 3, 2020 at 5:23 AM Gokul Srinivas  wrote:

> All,
>
> Gentle reminder - any comments on the line of thinking I mentioned in
> the last email? I've updated the Exception to be named
> "TransactionAbortedException" on the KIP confluence page.
>
> -Gokul
>
> On 01-09-2020 18:34, Gokul Srinivas wrote:
> > Matthias, Sophie, Jason,
> >
> > Took another pass at understanding the internals and it seems to me
> > like we should be extending the `ApiException` rather than the
> > `RetriableException`.
> >
> > The check in question
> > =
> >
> > Do we abort any undrained batches that are present on this transaction
> > if the transaction is in an aborting state? And, if we do, what would
> > be the reason sent back to the user for aborting these batches?
> >
> > Logic for this
> > ==
> >
> > If the transaction `isAborting` and `hasAbortableError` and the
> > `lastError()` is not empty -> then there has been some error which
> > will cause / has caused the transaction to abort and this *is* a
> > runtime exception. This same exception should be sent back to the user.
> >
> > If the transaction `isAborting` and `lastError()` is empty -> then for
> > some unknown reason (maybe even user initiated, according to the
> > tests), the transaction manager has started to abort the transaction.
> > In this case, the newly proposed exception should be sent back to the
> > user.
> >
> > Reasoning
> > =
> >
> > Prima facie - I do not think this is a `RetriableException`.
> >
> > If the user has chosen to abort this transaction, then it would be up
> > to the user to choose whether to retry the exception, in which case it
> > is /*not*/ a `RetriableException`.
> >
> > If there is a case where the transaction manager has no error, but has
> > started to abort the exception, we still do not retry the transaction,
> > rather we abort any undrained batches - in which case, it is /*still
> > not*/ a `RetriableException`.
> >
> > Does that sound right?
> >
> > -Gokul
> >
> > On 29-08-2020 01:17, Jason Gustafson wrote:
> >> Hi Gokul,
> >>
> >> Thanks, I think it makes sense to use a separate exception type. +1 on
> >> Sophie's suggestion for `TransactionAbortedException`.
> >>
> >> Extending from `RetriableException` seems reasonable as well. I guess
> >> the
> >> only question is whether it's safe to catch it as a `RetriableException`
> >> and apply common retry logic. For a transactional producer, my
> >> expectation
> >> is that the application would abort the transaction and retry it.
> >> However,
> >> if the transaction is already being aborted, maybe it would be better to
> >> skip the abort. It might be helpful to have an example which shows
> >> how we
> >> expect applications to handle this.
> >>
> >> Thanks,
> >> Jason
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Aug 27, 2020 at 6:25 PM Sophie Blee-Goldman
> >> 
> >> wrote:
> >>
> >>> Hey Gokul, thanks for taking up this KIP!
> >>>
> >>> I agree with Matthias that directly extending KafkaException may not be
> >>> ideal,
> >>> and we should instead extend APIException or RetriableException. Of the
> >>> two,
> >>> I think APIException would be more appropriate. My understanding is
> >>> that
> >>> RetriableException is generally reserved for internally retriable
> >>> exceptions
> >>> whereas APIException is used for pseudo-fatal exceptions that
> >>> require some
> >>> user input as to how to proceed (eg ProducerFencedException)
> >>>
> >>> I also agree that the name could be a bit more concise. My personal
> >>> vote
> >>> would be for "TransactionAbortedException" which seems a bit more
> >>> grammatically aligned with the other exceptions in Kafka.
> >>>
> >>> Cheers,
> >>> Sophie
> >>>
> >>> On Thu, Aug 27, 2020 at 6:01 PM Matthias J. Sax 
> >>> wrote:
> >>>
>  Thanks for the KIP. Looks good overall.
> 
>  However, I am wondering if the new exception should extend
>  `KafkaException`? It seems, extending `ApiException` or maybe even
>  `RetriableException` might be better?
> 
>  About the name itself. I would prefer something simpler like
>  `AbortedTransactionException`.
> 
>  Thoughts?
> 
> 
>  -Matthias
> 
> 
>  On 8/27/20 10:24 AM, Gokul Srinivas wrote:
> > Hello all,
> >
>

Re: [DISCUSS] KIP-656: MirrorMaker2 Exactly-once Semantics

2020-09-03 Thread Ning Zhang

Hi Mickael,

Definitely we can include Sink connector for checkpoints and even heartbeats, 
but I am thinking if the existing 3 connectors, MirrorSourceConnector, 
MirrorCheckpointConnector and MirrorHeartbeatConnector are managed separately, 
so that we could reduce the footprint of introducing EOS, while maintaining the 
correctness of Checkpoint and Heartbeat, given that the checkpoint and 
heartbeat are very lightweight connectors in terms of traffic load and logics. 
Maybe Ryanne may chime in and share what is his thoughts. 

For any reason, if we need to include Sink connector for checkpoints and 
heartbeats, to make this KIP approved, I plan to create a PR to extract the 
common functions of data mirroring task, Checkpoint and Heartbeat from existing 

- MirrorSourceConnector.java
- MirrorSourceTask.java, 
- MirrorCheckpointConnector.java
- MirrorCheckpointTask.java
- MirrorHeartbeatConnector.java
- MirrorHeartbeatTask.java

into some "common" files, so that the common functions could re-used by 
MirrorSinkConnecotor, MirrorSinkTask, MirrorSinkand etc (based on Sink 
Connector and Sink Task)

- MirrorCheckpointConnCommon.java
- MirrorCheckpointTaskCommon.java
- MirrorConnectorCommon.java
- MirrorTaskCommon.java
- MirrorHeartbeatConnCommon.java
- MirrorHeartbeatTaskCommon.java

Thoughts?

On 2020/09/03 15:06:00, Mickael Maison  wrote: 
> Hi Ning,
> 
> Thanks for the updates.
> 
> 1. If you have to run a Sink (the new MirrorSinkConnector) and Source
> (MirrorCheckpoint) connector for MM2 you will need 2 Connect runtimes.
> So this does not work well for users of Connect. I've not really
> looked into it yet but I wonder if we should include a Sink connector
> for checkpoints too
> 
> On Thu, Sep 3, 2020 at 6:51 AM Ning Zhang  wrote:
> >
> > bump for another potential more discussion
> >
> > On 2020/08/27 23:31:38, Ning Zhang  wrote:
> > > Hello Mickael,
> > >
> > > > 1. How does offset translation work with this new sink connector?
> > > > Should we also include a CheckpointSinkConnector?
> > >
> > > CheckpointSourceConnector will be re-used as the same as current. When 
> > > EOS is enabled, we will run 3 connectors:
> > >
> > > MirrorSinkConnector (based on SinkConnector)
> > > MirrorCheckpointConnector (based on SourceConnector)
> > > MirrorHeartbeatConnector (based on SourceConnector)
> > >
> > > For the last two connectors (checkpoint, heartbeat), if we do not 
> > > strictly require EOS, it is probably OK to use current implementation on 
> > > SourceConnector.
> > >
> > > I will update the KIP to clarify this, if it sounds acceptable.
> > >
> > > > 2. Migrating to this new connector could be tricky as effectively the
> > > > Connect runtime needs to point to the other cluster, so its state
> > > > (stored in the __connect topics) is lost. Unfortunately there is no
> > > > easy way today to prime Connect with offsets. Not necessarily a
> > > > blocking issue but this should be described as I think the current
> > > > Migration section looks really optimistic at the moment
> > >
> > > totally agree. I will update the migration part with notes about 
> > > potential service interruption, without careful planning.
> > >
> > > > 3. We can probably find a better name than "transaction.producer".
> > > > Maybe we can follow a similar pattern than Streams (which uses
> > > > "processing.guarantee")?
> > >
> > > "processing.guarantee" sounds better
> > >
> > > > 4. Transactional Ids used by the producer are generated based on the
> > > > task assignments. If there's a single task, if it crashes and restarts
> > > > it would still get the same id. Can this be an issue?
> > >
> > > From https://tgrez.github.io/posts/2019-04-13-kafka-transactions.html, 
> > > the author suggests to postfix transaction.id with :
> > >
> > > "To avoid handling an external store we will use a static encoding 
> > > similarly as in spring-kafka:
> > > The transactional.id is now the transactionIdPrefix appended with 
> > > ..."
> > >
> > > I think as long as there is no more than one producer use same 
> > > "transaction.id" at the same time, it is OK.
> > >
> > > Also from my tests, this "transaction.id" assignment works fine with 
> > > failures. To tighten it up, I also tested to use  "connector task id" in 
> > > "transaction.id". The "connector task id" is typically composed of 
> > > connector_name and task_id, which is also unique across all connectors in 
> > > a KC cluster.
> > >
> > >  > 5. The logic in the KIP creates a new transaction every time put() is
> > > > called. Is there a performance impact?
> > >
> > > It could be a performance hit if the transaction batch is too small under 
> > > high ingestion rate. The batch size depends on how many messages that 
> > > consumer poll each time. Maybe we could increase "max.poll.records" to 
> > > have larger batch size.
> > >
> > > Overall, thanks so much for the valuable feedback. If the responses 
> > > sounds good, I will do a cleanup of KIP.
> > >
> > > On

Re: Contributor Access to JIRA

2020-09-03 Thread Bill Bejeck

Done.

Thanks for your interest in Apache Kafka!

-Bill

On Thu, Sep 3, 2020 at 11:23 AM Nag Y  wrote:

> Here it is
>
> username : nag9s
>
> On Thu, Sep 3, 2020 at 8:36 PM Matthias J. Sax  wrote:
>
> > Please create an account (self-service) and share you account info here,
> > so we can add you.
> >
> > On 9/3/20 6:49 AM, Kp k wrote:
> > > Hi,
> > >
> > > Can you please provide me Contributor access to Kafka JIRA, as I am
> > > interested in contributing.
> > >
> > > Thanks,
> > > Kalpitha
> > >
> >
> >
>

Re: Discussion on KIP-670 : Add ConsumerGroupCommand to delete static members

2020-09-03 Thread Walker Carlson

Hello Sandeep,

Reading through your kip it seems like a good idea and pretty straight
forward. So I have no problems with this proposal.

Thanks for the Kip,

Walker


On Thu, Sep 3, 2020 at 8:28 AM Sandeep Kumar  wrote:

> Hi All,
>
> I am new to the Kafka contribution community. I have picked up a jira
> ticket https://issues.apache.org/jira/browse/KAFKA-9440 which requires
> KIP.
>
> I have submitted KIP for it
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-670%3A+Add+ConsumerGroupCommand+to+delete+static+members
>
> I am proposing that we add a new option (--remove-members) from consumer
> group via CLI.
>
> I'd really appreciate your feedback on the proposal.
>
> Thanks and Regards,
> Sandeep
>

Re: [VOTE] KIP-614: Add Prefix Scan support for State Stores

2020-09-03 Thread Sagar

Hi John,

Thank you! I have marked the KIP as Accepted :)

Regarding the point on InMemoryKeyValueStore, in the PR I had added the
implementation for InMemoryKeyValueStore as well. I hadn't mentioned about
it in the KIP which I have done now as you suggested.

Thanks!
Sagar.

On Thu, Sep 3, 2020 at 8:10 PM John Roesler  wrote:

> Hi Sagar,
>
> Yes! Congratulations :)
>
> Now, you can mark the status of the KIP as "Accepted" and we
> can move on to reviewing your PRs.
>
> One quick note: Matthias didn't have time to review the KIP
> in full, but he did point out to me that there's a lot of
> information about the RocksDB implementation and no mention
> of the InMemory store. We both agree that we should
> implement the new method also for the InMemory store.
> Assuming you agree, note that we don't need to discuss any
> implementation details, so you could just update the KIP
> document to also mention, "We will also implement the new
> method in the InMemoryKeyValueStore."
>
> Thanks for your contribution to Apache Kafka!
> -John
>
> On Thu, 2020-09-03 at 09:30 +0530, Sagar wrote:
> > Thanks All!
> >
> > I see 3 binding +1 votes and 2 non-binding +1s. Does it mean this KIP has
> > gained a lazy majority?
> >
> > Thanks!
> > Sagar.
> >
> > On Thu, Sep 3, 2020 at 6:51 AM Guozhang Wang  wrote:
> >
> > > Thanks for the KIP Sagar. I'm +1 (binding) too.
> > >
> > >
> > > Guozhang
> > >
> > > On Tue, Sep 1, 2020 at 1:24 PM Bill Bejeck  wrote:
> > >
> > > > Thanks for the KIP! This is a great addition to the streams API.
> > > >
> > > > +1 (binding)
> > > >
> > > > -Bill
> > > >
> > > > On Tue, Sep 1, 2020 at 12:33 PM Sagar 
> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > Bumping the thread again !
> > > > >
> > > > > Thanks!
> > > > > Sagar.
> > > > >
> > > > > On Wed, Aug 5, 2020 at 12:08 AM Sophie Blee-Goldman <
> > > sop...@confluent.io
> > > > > wrote:
> > > > >
> > > > > > Thanks Sagar! +1 (non-binding)
> > > > > >
> > > > > > Sophie
> > > > > >
> > > > > > On Sun, Aug 2, 2020 at 11:37 PM Sagar  >
> > > > wrote:
> > > > > > > Hi All,
> > > > > > >
> > > > > > > Just thought of bumping this voting thread again to see if we
> can
> > > > form
> > > > > > any
> > > > > > > consensus around this.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Sagar.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 20, 2020 at 4:21 AM Adam Bellemare <
> > > > > adam.bellem...@gmail.com
> > > > > > > wrote:
> > > > > > >
> > > > > > > > LGTM
> > > > > > > > +1 non-binding
> > > > > > > >
> > > > > > > > On Sun, Jul 19, 2020 at 4:13 AM Sagar <
> sagarmeansoc...@gmail.com
> > > > > > wrote:
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > Bumping this thread to see if there are any feedbacks.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Sagar.
> > > > > > > > >
> > > > > > > > > On Tue, Jul 14, 2020 at 9:49 AM John Roesler <
> > > > vvcep...@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > Thanks for the KIP, Sagar!
> > > > > > > > > >
> > > > > > > > > > I’m +1 (binding)
> > > > > > > > > >
> > > > > > > > > > -John
> > > > > > > > > >
> > > > > > > > > > On Sun, Jul 12, 2020, at 02:05, Sagar wrote:
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > > I would like to start a new voting thread for the
> below KIP
> > > > to
> > > > > > add
> > > > > > > > > prefix
> > > > > > > > > > > scan support to state stores:
> > > > > > > > > > >
> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > 614%3A+Add+Prefix+Scan+support+for+State+Stores
> > > > > > > > > > > <
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-614%3A+Add+Prefix+Scan+support+for+State+Stores
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > > Sagar.
> > > > > > > > > > >
> > >
> > > --
> > > -- Guozhang
> > >
>
>

Re: [DISCUSS] KIP-660: Pluggable ReplicaAssignor

2020-09-03 Thread Mickael Maison

Thanks Robert and Ryanne for the feedback.

ReplicaAssignor implementations can throw an exception to indicate an
assignment can't be computed. This is already what the current round
robin assignor does. Unfortunately at the moment, there are no generic
error codes if it fails, it's either INVALID_PARTITIONS,
INVALID_REPLICATION_FACTOR or worse UNKNOWN_SERVER_ERROR.

So I think it would be nice to introduce a new Exception/Error code to
cover any failures in the assignor and avoid UNKNOWN_SERVER_ERROR.

I've updated the KIP accordingly, let me know if you have more questions.

On Fri, Aug 28, 2020 at 4:49 PM Ryanne Dolan  wrote:
>
> Thanks Mickael, the KIP makes sense to me, esp for cases where an external
> system (like cruise control or an operator) knows more about the target
> cluster state than the broker does.
>
> Ryanne
>
> On Thu, Aug 20, 2020, 10:46 AM Mickael Maison 
> wrote:
>
> > Hi,
> >
> > I've created KIP-660 to make the replica assignment logic pluggable.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaAssignor
> >
> > Please take a look and let me know if you have any feedback.
> >
> > Thanks
> >

Discussion on KIP-670 : Add ConsumerGroupCommand to delete static members

2020-09-03 Thread Sandeep Kumar

Hi All,

I am new to the Kafka contribution community. I have picked up a jira
ticket https://issues.apache.org/jira/browse/KAFKA-9440 which requires KIP.

I have submitted KIP for it
https://cwiki.apache.org/confluence/display/KAFKA/KIP-670%3A+Add+ConsumerGroupCommand+to+delete+static+members

I am proposing that we add a new option (--remove-members) from consumer
group via CLI.

I'd really appreciate your feedback on the proposal.

Thanks and Regards,
Sandeep

Re: Contributor Access to JIRA

2020-09-03 Thread Nag Y

Here it is

username : nag9s

On Thu, Sep 3, 2020 at 8:36 PM Matthias J. Sax  wrote:

> Please create an account (self-service) and share you account info here,
> so we can add you.
>
> On 9/3/20 6:49 AM, Kp k wrote:
> > Hi,
> >
> > Can you please provide me Contributor access to Kafka JIRA, as I am
> > interested in contributing.
> >
> > Thanks,
> > Kalpitha
> >
>
>

Re: Contributor Access to JIRA

Please create an account (self-service) and share you account info here,
so we can add you.

On 9/3/20 6:49 AM, Kp k wrote:
> Hi,
> 
> Can you please provide me Contributor access to Kafka JIRA, as I am
> interested in contributing.
> 
> Thanks,
> Kalpitha
> 



signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-656: MirrorMaker2 Exactly-once Semantics

2020-09-03 Thread Mickael Maison

Hi Ning,

Thanks for the updates.

1. If you have to run a Sink (the new MirrorSinkConnector) and Source
(MirrorCheckpoint) connector for MM2 you will need 2 Connect runtimes.
So this does not work well for users of Connect. I've not really
looked into it yet but I wonder if we should include a Sink connector
for checkpoints too

On Thu, Sep 3, 2020 at 6:51 AM Ning Zhang  wrote:
>
> bump for another potential more discussion
>
> On 2020/08/27 23:31:38, Ning Zhang  wrote:
> > Hello Mickael,
> >
> > > 1. How does offset translation work with this new sink connector?
> > > Should we also include a CheckpointSinkConnector?
> >
> > CheckpointSourceConnector will be re-used as the same as current. When EOS 
> > is enabled, we will run 3 connectors:
> >
> > MirrorSinkConnector (based on SinkConnector)
> > MirrorCheckpointConnector (based on SourceConnector)
> > MirrorHeartbeatConnector (based on SourceConnector)
> >
> > For the last two connectors (checkpoint, heartbeat), if we do not strictly 
> > require EOS, it is probably OK to use current implementation on 
> > SourceConnector.
> >
> > I will update the KIP to clarify this, if it sounds acceptable.
> >
> > > 2. Migrating to this new connector could be tricky as effectively the
> > > Connect runtime needs to point to the other cluster, so its state
> > > (stored in the __connect topics) is lost. Unfortunately there is no
> > > easy way today to prime Connect with offsets. Not necessarily a
> > > blocking issue but this should be described as I think the current
> > > Migration section looks really optimistic at the moment
> >
> > totally agree. I will update the migration part with notes about potential 
> > service interruption, without careful planning.
> >
> > > 3. We can probably find a better name than "transaction.producer".
> > > Maybe we can follow a similar pattern than Streams (which uses
> > > "processing.guarantee")?
> >
> > "processing.guarantee" sounds better
> >
> > > 4. Transactional Ids used by the producer are generated based on the
> > > task assignments. If there's a single task, if it crashes and restarts
> > > it would still get the same id. Can this be an issue?
> >
> > From https://tgrez.github.io/posts/2019-04-13-kafka-transactions.html, the 
> > author suggests to postfix transaction.id with :
> >
> > "To avoid handling an external store we will use a static encoding 
> > similarly as in spring-kafka:
> > The transactional.id is now the transactionIdPrefix appended with 
> > ..."
> >
> > I think as long as there is no more than one producer use same 
> > "transaction.id" at the same time, it is OK.
> >
> > Also from my tests, this "transaction.id" assignment works fine with 
> > failures. To tighten it up, I also tested to use  "connector task id" in 
> > "transaction.id". The "connector task id" is typically composed of 
> > connector_name and task_id, which is also unique across all connectors in a 
> > KC cluster.
> >
> >  > 5. The logic in the KIP creates a new transaction every time put() is
> > > called. Is there a performance impact?
> >
> > It could be a performance hit if the transaction batch is too small under 
> > high ingestion rate. The batch size depends on how many messages that 
> > consumer poll each time. Maybe we could increase "max.poll.records" to have 
> > larger batch size.
> >
> > Overall, thanks so much for the valuable feedback. If the responses sounds 
> > good, I will do a cleanup of KIP.
> >
> > On 2020/08/27 09:59:57, Mickael Maison  wrote:
> > > Thanks Ning for the KIP. Having stronger guarantees when mirroring
> > > data would be a nice improvement!
> > >
> > > A few comments:
> > > 1. How does offset translation work with this new sink connector?
> > > Should we also include a CheckpointSinkConnector?
> > >
> > > 2. Migrating to this new connector could be tricky as effectively the
> > > Connect runtime needs to point to the other cluster, so its state
> > > (stored in the __connect topics) is lost. Unfortunately there is no
> > > easy way today to prime Connect with offsets. Not necessarily a
> > > blocking issue but this should be described as I think the current
> > > Migration section looks really optimistic at the moment
> > >
> > > 3. We can probably find a better name than "transaction.producer".
> > > Maybe we can follow a similar pattern than Streams (which uses
> > > "processing.guarantee")?
> > >
> > > 4. Transactional Ids used by the producer are generated based on the
> > > task assignments. If there's a single task, if it crashes and restarts
> > > it would still get the same id. Can this be an issue?
> > >
> > > 5. The logic in the KIP creates a new transaction every time put() is
> > > called. Is there a performance impact?
> > >
> > > On Fri, Aug 21, 2020 at 4:58 PM Ryanne Dolan  
> > > wrote:
> > > >
> > > > Awesome, this will be a huge advancement. I also want to point out that
> > > > this KIP implements MirrorSinkConnector as well, finally, which is a 
> > > > very
> > > > often

Re: Contributor Access to JIRA

2020-09-03 Thread Nag Y

Hi,

Please approve Contributor access to Kafka JIRA, as I am
interested in contributing.

Thanks,
Nag

Contributor Access to JIRA

2020-09-03 Thread Kp k

Hi,

Can you please provide me Contributor access to Kafka JIRA, as I am
interested in contributing.

Thanks,
Kalpitha

Re: [VOTE] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

+1 (binding)

On 9/3/20 6:16 AM, Bruno Cadonna wrote:
> Hi,
> 
> I would like to start the voting on KIP-663 that proposes to add methods
> to the Kafka Streams client to add and remove stream threads during
> execution.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads
> 
> 
> Best,
> Bruno



signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

Thanks! SGTM.

-Matthias

On 9/3/20 3:17 AM, Bruno Cadonna wrote:
> Hi Matthias,
> 
> I replied inline.
> 
> Best,
> Bruno
> 
> On 02.09.20 22:06, Matthias J. Sax wrote:
>> Thanks for updating the KIP.
>>
>> Why do you propose to return `boolean` from addStreamThread() if the
>> thread could not be started? As an alternative, we could also throw an
>> exception if the client is not in state RUNNING? -- I guess both are
>> valid options: just want to see what the pros/cons of each approach
>> would be?
>>
> 
> I prefer to return a boolean because it is nothing exceptional if a
> stream thread cannot be added due to an inappropriate state. State
> changes are expected in Streams. Furthermore, users should not be forced
> to control their program flow by catching exceptions. Let me give you
> some examples for returning a boolean and throwing an exception:
> 
> returning a boolean
> 
> while (!kafkaStreams.addStreamThread() &&
>    kafkaStreams.state() != State.NOT_RUNNING &&
>    kafkaStreams.state() != State.ERROR) {
> }
> 
> 
> throwing an exception
> 
> boolean added = false;
> while (!added &&
>    kafkaStreams.state() != State.NOT_RUNNING &&
>    kafkaStreams.state() != State.ERROR) {
> 
>     try {
>     kafkaStreams.addStreamThread();
> added = true;
>     } catch (final Exception ex) {
> // do nothing
>     }
> }
> 
> IMO the first example is more readable than the second.
> 
> 
>> Btw: should we allow to add a new thread if the state is REBALANCING,
>> too? I actually don't see a reason why we should not allow this?
>>
> 
> I guess you are right. I will update the KIP and include REBALANCING.
> 
> 
>> For removeStreamThread(), might it be worth to actually guarantee that
>> the thread with the largest index is stopped instead of leaving if
>> unspecified? It does not seem to be a big burden on the implementation
>> and given that we plan to reused indices of died threads, it might be
>> nice to have a contract? Or would there be any negative impact if we
>> guarantee it?
>>
> 
> I left unspecified which stream thread is removed since I could not find
> any good reason for a guarantee. Also in your comment, I do not see what
> advantage, we would have if we guaranteed that the stream thread with
> the largest index is stopped. It would not guarantee that the next added
> stream thread would get the largest index, because another stream thread
> with a lower index could have failed in the meanwhile and now two
> indices are up for grabs.
> Leaving unspecified which stream thread is removed also gives us the
> possibility to choose the stream thread to remove according to other
> aspects like for example the one with the least local state.
> 
> 
>> Another thought: should we add a parameter `numberOfThreads` to each
>> method to allow users to start/stop multiple threads at once?
>>
> 
> I would keep it simple for now and add overloads if users request them.
> 
> 
>> What happens if there is zero running threads and one calls
>> removeStreamThread()? Should we also add a `boolean` flag and return
>> `false` for this case (or throw an exception)?
>>
> 
> Yeah, I think this is a good idea for the programmatical removal of all
> threads. However, I would not throw an exception for the reasons I
> pointed out above.
> 
> 
>>
>> For the metric name, I would prefer "failed" over "crashed". Thoughts?
>>
> 
> I think I like "failed" more than "crashed" and it is also more
> consistent with other parts of the code like the
> ProductionExceptionHandlerResponse.FAIL.
> 
> 
>>
>> Side remark for the PR: can we make sure to update the description of
>> `num.stream.threads` to explain that it's the _initial_ number of
>> threads on startup?
>>
> 
> Good point!
> 
>>
>> -Matthias
>>
>>
>> On 9/1/20 2:01 PM, Walker Carlson wrote:
>>> Hi Bruno,
>>>
>>> I read through your updated KIP and it looks good to me. I agree with
>>> adding the metric to keep track of crashed streams in replace of a
>>> list of
>>> dead streams.
>>>
>>> best,
>>> Wlaker :)
>>>
>>> On Tue, Sep 1, 2020 at 1:05 PM Bruno Cadonna  wrote:
>>>
 Hi John,

 your proposal makes sense! I will update the KIP.

 Best,
 Bruno

 On 01.09.20 17:31, John Roesler wrote:
> Hello Bruno,
>
> Thanks for the update! The KIP looks good to me; I only have
> a grammatical complaint about the proposed metric name.
>
> "Died" is a verb, the past tense of "to die", but in the
> expression,"x stream threads", x should be an adjective. To
> be fair, "died" is also the past participle of "to die", and
> participles can usually be used as adjectives. Maybe it
> sounds wrong to me because there's already a specifically
> adjectival form: "dead". So "dead-stream-threads" seems more
> natural.
>
> However, I'm not sure if that captures the specific meaning
> you're shooting for, namely that the metric counts only the
> threads that died exceptionally, vs.

Re: [VOTE] KIP-614: Add Prefix Scan support for State Stores

2020-09-03 Thread John Roesler

Hi Sagar,

Yes! Congratulations :)

Now, you can mark the status of the KIP as "Accepted" and we
can move on to reviewing your PRs.

One quick note: Matthias didn't have time to review the KIP
in full, but he did point out to me that there's a lot of
information about the RocksDB implementation and no mention
of the InMemory store. We both agree that we should
implement the new method also for the InMemory store.
Assuming you agree, note that we don't need to discuss any
implementation details, so you could just update the KIP
document to also mention, "We will also implement the new
method in the InMemoryKeyValueStore."

Thanks for your contribution to Apache Kafka!
-John

On Thu, 2020-09-03 at 09:30 +0530, Sagar wrote:
> Thanks All!
> 
> I see 3 binding +1 votes and 2 non-binding +1s. Does it mean this KIP has
> gained a lazy majority?
> 
> Thanks!
> Sagar.
> 
> On Thu, Sep 3, 2020 at 6:51 AM Guozhang Wang  wrote:
> 
> > Thanks for the KIP Sagar. I'm +1 (binding) too.
> > 
> > 
> > Guozhang
> > 
> > On Tue, Sep 1, 2020 at 1:24 PM Bill Bejeck  wrote:
> > 
> > > Thanks for the KIP! This is a great addition to the streams API.
> > > 
> > > +1 (binding)
> > > 
> > > -Bill
> > > 
> > > On Tue, Sep 1, 2020 at 12:33 PM Sagar  wrote:
> > > 
> > > > Hi All,
> > > > 
> > > > Bumping the thread again !
> > > > 
> > > > Thanks!
> > > > Sagar.
> > > > 
> > > > On Wed, Aug 5, 2020 at 12:08 AM Sophie Blee-Goldman <
> > sop...@confluent.io
> > > > wrote:
> > > > 
> > > > > Thanks Sagar! +1 (non-binding)
> > > > > 
> > > > > Sophie
> > > > > 
> > > > > On Sun, Aug 2, 2020 at 11:37 PM Sagar 
> > > wrote:
> > > > > > Hi All,
> > > > > > 
> > > > > > Just thought of bumping this voting thread again to see if we can
> > > form
> > > > > any
> > > > > > consensus around this.
> > > > > > 
> > > > > > Thanks!
> > > > > > Sagar.
> > > > > > 
> > > > > > 
> > > > > > On Mon, Jul 20, 2020 at 4:21 AM Adam Bellemare <
> > > > adam.bellem...@gmail.com
> > > > > > wrote:
> > > > > > 
> > > > > > > LGTM
> > > > > > > +1 non-binding
> > > > > > > 
> > > > > > > On Sun, Jul 19, 2020 at 4:13 AM Sagar  > > > > wrote:
> > > > > > > > Hi All,
> > > > > > > > 
> > > > > > > > Bumping this thread to see if there are any feedbacks.
> > > > > > > > 
> > > > > > > > Thanks!
> > > > > > > > Sagar.
> > > > > > > > 
> > > > > > > > On Tue, Jul 14, 2020 at 9:49 AM John Roesler <
> > > vvcep...@apache.org>
> > > > > > > wrote:
> > > > > > > > > Thanks for the KIP, Sagar!
> > > > > > > > > 
> > > > > > > > > I’m +1 (binding)
> > > > > > > > > 
> > > > > > > > > -John
> > > > > > > > > 
> > > > > > > > > On Sun, Jul 12, 2020, at 02:05, Sagar wrote:
> > > > > > > > > > Hi All,
> > > > > > > > > > 
> > > > > > > > > > I would like to start a new voting thread for the below KIP
> > > to
> > > > > add
> > > > > > > > prefix
> > > > > > > > > > scan support to state stores:
> > > > > > > > > > 
> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > 614%3A+Add+Prefix+Scan+support+for+State+Stores
> > > > > > > > > > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-614%3A+Add+Prefix+Scan+support+for+State+Stores
> > > > > > > > > > 
> > > > > > > > > > Thanks!
> > > > > > > > > > Sagar.
> > > > > > > > > > 
> > 
> > --
> > -- Guozhang
> >

Re: [DISCUSS] KIP-406: GlobalStreamThread should honor custom reset policy

I think this issue can actually be resolved.

 - We need a flag on the stream-threads if global-restore is in
progress; for this case, the stream-thread may go into RUNNING state,
but it's not allowed to actually process data -- it will be allowed to
update standby-task though.

 - If a stream-thread restores, its own state is RESTORING and it does
not need to care about the "global restore flag".

 - The global-thread just does was we discussed, including using state
RESTORING.

 - The KafkaStreams client state is in RESTORING, if at least one thread
(stream-thread or global-thread) is in state RESTORING.

 - On startup, if there is a global-thread, the just set the
global-restore flag upfront before we start the stream-threads (we can
actually still do the rebalance and potential restore in stream-thread
in parallel to global restore) and rely on the global-thread to unset
the flag.

 - The tricky thing is, to "stop" processing in stream-threads if we
need to wipe the global-store and rebuilt it. For this, we should set
the "global restore flag" on the stream-threads, but we also need to
"lock down" the global store in question and throw an exception if the
stream-thread tries to access it; if the stream-thread get this
exception, it need to cleanup itself, and wait until the "global restore
flag" is unset before it can continue.


Do we think this would work? -- Of course, the devil is in the details
but it seems to become a PR discussion, and there is no reason to make
it part of the KIP.


-Matthias

On 9/3/20 3:41 AM, Navinder Brar wrote:
> Hi,
> 
> Thanks, John, Matthias and Sophie for great feedback.
> 
> On the point raised by Sophie that maybe we should allow normal restoring 
> during GLOBAL_RESTORING, I think it makes sense but the challenge would be 
> what happens when normal restoring(on actives) has finished but 
> GLOBAL_RESTORINGis still going on. Currently, all restorings are independent 
> of each other i.e. restoring happening on one task/thread doesn't affect 
> another. But if we do go ahead with allowing normal restoring during 
> GLOBAL_RESTORING then we willstill have to pause the active tasks from going 
> to RUNNING if GLOBAL_RESTORING has not finished and normal restorings have 
> finished. For the same reason, I wouldn't want to combine global restoring 
> and normal restoring because then it would make all the restorings 
> independent but we don't want that. We want global stores to be available 
> before any processing starts on the active tasks.
> 
> Although I think restoring of replicas can still take place while global 
> stores arerestoring because in replicas there is no danger of them starting 
> processing. 
> 
> Also, one point to bring up is that currently during application startup 
> global stores restore first and then normal stream threads start.
> 
> Regards,Navinder 
> 
> On Thursday, 3 September, 2020, 06:58:40 am IST, Matthias J. Sax 
>  wrote:  
>  
>  Thanks for the input Sophie. Those are all good points and I fully agree
> with them.
> 
> When saying "pausing the processing threads" I only considered them in
> `RUNNING` and thought we figure out the detail on the PR... Excellent catch!
> 
> Changing state transitions is to some extend backward incompatible, but
> I think (IIRC) we did it in the past and I personally tend to find it
> ok. That's why we cover those changes in a KIP.
> 
> -Matthias
> 
> On 9/2/20 6:18 PM, Sophie Blee-Goldman wrote:
>> If we're going to add a new GLOBAL_RESTORING state to the KafkaStreams FSM,
>> maybe it would make sense to add a new plain RESTORING state that we
>> transition
>> to when restoring non-global state stores following a rebalance. Right now
>> all restoration
>> occurs within the REBALANCING state, which is pretty misleading.
>> Applications that
>> have large amounts of state to restore will appear to be stuck rebalancing
>> according to
>> the state listener, when in fact the rebalance has completed long ago.
>> Given that there
>> are very much real scenarios where you actually *are *stuck rebalancing, it
>> seems useful to
>> distinguish plain restoration from more insidious cases that may require
>> investigation and/or
>> intervention.
>>
>> I don't mean to hijack this KIP, I just think it would be odd to introduce
>> GLOBAL_RESTORING
>> when there is no other kind of RESTORING state. One question this brings
>> up, and I
>> apologize if this has already been addressed, is what to do when we are
>> restoring
>> both normal and global state stores? It sounds like we plan to pause the
>> StreamThreads
>> entirely, but there doesn't seem to be any reason not to allow regular
>> state restoration -- or
>> even standby processing -- while the global state is restoring.Given the
>> current effort to move
>> restoration & standbys to a separate thread, allowing them to continue
>> while pausing
>> only the StreamThread seems quite natural.
>>
>> Assuming that we actually do allow both types of restoration

Re: Need contributor access to Kafka Improvement Proposals

Done.

On 9/3/20 5:08 AM, Sandeep Kumar wrote:
> Hi Matthias,
> 
> Can you please grant me contributor access to create KIP ?
> 
>  UserId : sndp2693
>  EmailId : sndp2...@gmail.com 
> 
> 
> Regards,
> Sandeep
>  
> 
> On Thu, Sep 3, 2020 at 1:26 PM Sandeep Kumar  > wrote:
> 
> HI,
> 
> Can you please grant me access to create KIP?
> 
> Thanks,
> Sandeep
> 



signature.asc
Description: OpenPGP digital signature

[VOTE] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

2020-09-03 Thread Bruno Cadonna


Hi,

I would like to start the voting on KIP-663 that proposes to add methods 
to the Kafka Streams client to add and remove stream threads during 
execution.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads

Best,
Bruno

Re: [DISCUSS] KIP-654 Aborting Transaction with non-flushed data should throw a non-fatal Exception

All,

Gentle reminder - any comments on the line of thinking I mentioned in
the last email? I've updated the Exception to be named
"TransactionAbortedException" on the KIP confluence page.

-Gokul

On 01-09-2020 18:34, Gokul Srinivas wrote:

Matthias, Sophie, Jason,

Took another pass at understanding the internals and it seems to me
like we should be extending the `ApiException` rather than the
`RetriableException`.

The check in question
=

Do we abort any undrained batches that are present on this transaction
if the transaction is in an aborting state? And, if we do, what would
be the reason sent back to the user for aborting these batches?

Logic for this
==

If the transaction `isAborting` and `hasAbortableError` and the
`lastError()` is not empty -> then there has been some error which
will cause / has caused the transaction to abort and this *is* a
runtime exception. This same exception should be sent back to the user.

If the transaction `isAborting` and `lastError()` is empty -> then for
some unknown reason (maybe even user initiated, according to the
tests), the transaction manager has started to abort the transaction.
In this case, the newly proposed exception should be sent back to the
user.

Reasoning
=

Prima facie - I do not think this is a `RetriableException`.

If the user has chosen to abort this transaction, then it would be up
to the user to choose whether to retry the exception, in which case it
is /*not*/ a `RetriableException`.

If there is a case where the transaction manager has no error, but has
started to abort the exception, we still do not retry the transaction,
rather we abort any undrained batches - in which case, it is /*still
not*/ a `RetriableException`.

Does that sound right?

-Gokul

On 29-08-2020 01:17, Jason Gustafson wrote:

Hi Gokul,

Thanks, I think it makes sense to use a separate exception type. +1 on
Sophie's suggestion for `TransactionAbortedException`.

Extending from `RetriableException` seems reasonable as well. I guess
the

only question is whether it's safe to catch it as a `RetriableException`
and apply common retry logic. For a transactional producer, my
expectation
is that the application would abort the transaction and retry it.
However,

if the transaction is already being aborted, maybe it would be better to
skip the abort. It might be helpful to have an example which shows
how we

expect applications to handle this.

Thanks,
Jason

On Thu, Aug 27, 2020 at 6:25 PM Sophie Blee-Goldman

wrote:

Hey Gokul, thanks for taking up this KIP!

I agree with Matthias that directly extending KafkaException may not be
ideal,
and we should instead extend APIException or RetriableException. Of the
two,
I think APIException would be more appropriate. My understanding is
that

RetriableException is generally reserved for internally retriable
exceptions
whereas APIException is used for pseudo-fatal exceptions that
require some

user input as to how to proceed (eg ProducerFencedException)

I also agree that the name could be a bit more concise. My personal
vote

would be for "TransactionAbortedException" which seems a bit more
grammatically aligned with the other exceptions in Kafka.

Cheers,
Sophie

On Thu, Aug 27, 2020 at 6:01 PM Matthias J. Sax
wrote:

Thanks for the KIP. Looks good overall.

However, I am wondering if the new exception should extend
`KafkaException`? It seems, extending `ApiException` or maybe even
`RetriableException` might be better?

About the name itself. I would prefer something simpler like
`AbortedTransactionException`.

Thoughts?

-Matthias

On 8/27/20 10:24 AM, Gokul Srinivas wrote:

Hello all,

I would like to propose the following KIP to throw a new non-fatal
exception whilst aborting transactions with non-flushed data. This
will

help users distinguish non-fatal errors and potentially retry the

batch.

*Issue *- https://issues.apache.org/jira/browse/KAFKA-10186

*KIP *-

https://cwiki.apache.org/confluence/display/KAFKA/KIP-654:+Aborted+transaction+with+non-flushed+data+should+throw+a+non-fatal+exception

<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-654:+Aborted+transaction+with+non-flushed+data+should+throw+a+non-fatal+exception

Please let me know how best we can proceed with this.

-Gokul

Re: Need contributor access to Kafka Improvement Proposals

2020-09-03 Thread Sandeep Kumar

Hi Matthias,

Can you please grant me contributor access to create KIP ?

 UserId : sndp2693
 EmailId : sndp2...@gmail.com


Regards,
Sandeep


On Thu, Sep 3, 2020 at 1:26 PM Sandeep Kumar  wrote:

> HI,
>
> Can you please grant me access to create KIP?
>
> Thanks,
> Sandeep
>

[jira] [Created] (KAFKA-10460) ReplicaListValidator format checking is incomplete

2020-09-03 Thread Robin Palotai (Jira)

Robin Palotai created KAFKA-10460:
-

 Summary: ReplicaListValidator format checking is incomplete
 Key: KAFKA-10460
 URL: https://issues.apache.org/jira/browse/KAFKA-10460
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.4.1
Reporter: Robin Palotai


See 
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ConfigHandler.scala#L220]
 . The logic is supposed to accept only two cases:
 * list of k:v pairs
 * a single '*'

But in practice, since the disjunction's second part only checks that the head 
is '*', the case where a k:v list is headed by '*' is also accepted (and then 
later broker dies at startup, refusing the value).

This practically happened due to a CruiseControl bug (will link related issue 
later)

Observed on 2.4, but seems to be present in HEAD's source as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] KIP-406: GlobalStreamThread should honor custom reset policy

2020-09-03 Thread Navinder Brar

Hi,

Thanks, John, Matthias and Sophie for great feedback.

On the point raised by Sophie that maybe we should allow normal restoring 
during GLOBAL_RESTORING, I think it makes sense but the challenge would be what 
happens when normal restoring(on actives) has finished but GLOBAL_RESTORINGis 
still going on. Currently, all restorings are independent of each other i.e. 
restoring happening on one task/thread doesn't affect another. But if we do go 
ahead with allowing normal restoring during GLOBAL_RESTORING then we willstill 
have to pause the active tasks from going to RUNNING if GLOBAL_RESTORING has 
not finished and normal restorings have finished. For the same reason, I 
wouldn't want to combine global restoring and normal restoring because then it 
would make all the restorings independent but we don't want that. We want 
global stores to be available before any processing starts on the active tasks.

Although I think restoring of replicas can still take place while global stores 
arerestoring because in replicas there is no danger of them starting 
processing. 

Also, one point to bring up is that currently during application startup global 
stores restore first and then normal stream threads start.

Regards,Navinder 

On Thursday, 3 September, 2020, 06:58:40 am IST, Matthias J. Sax 
 wrote:  
 
 Thanks for the input Sophie. Those are all good points and I fully agree
with them.

When saying "pausing the processing threads" I only considered them in
`RUNNING` and thought we figure out the detail on the PR... Excellent catch!

Changing state transitions is to some extend backward incompatible, but
I think (IIRC) we did it in the past and I personally tend to find it
ok. That's why we cover those changes in a KIP.

-Matthias

On 9/2/20 6:18 PM, Sophie Blee-Goldman wrote:
> If we're going to add a new GLOBAL_RESTORING state to the KafkaStreams FSM,
> maybe it would make sense to add a new plain RESTORING state that we
> transition
> to when restoring non-global state stores following a rebalance. Right now
> all restoration
> occurs within the REBALANCING state, which is pretty misleading.
> Applications that
> have large amounts of state to restore will appear to be stuck rebalancing
> according to
> the state listener, when in fact the rebalance has completed long ago.
> Given that there
> are very much real scenarios where you actually *are *stuck rebalancing, it
> seems useful to
> distinguish plain restoration from more insidious cases that may require
> investigation and/or
> intervention.
> 
> I don't mean to hijack this KIP, I just think it would be odd to introduce
> GLOBAL_RESTORING
> when there is no other kind of RESTORING state. One question this brings
> up, and I
> apologize if this has already been addressed, is what to do when we are
> restoring
> both normal and global state stores? It sounds like we plan to pause the
> StreamThreads
> entirely, but there doesn't seem to be any reason not to allow regular
> state restoration -- or
> even standby processing -- while the global state is restoring.Given the
> current effort to move
> restoration & standbys to a separate thread, allowing them to continue
> while pausing
> only the StreamThread seems quite natural.
> 
> Assuming that we actually do allow both types of restoration to occur at
> the same time,
> and if we did add a plain RESTORING state as well, which state should we
> end up in?
> AFAICT the main reason for having a distinct {GLOBAL_}RESTORING state is to
> alert
> users of the non-progress of their active tasks. In both cases, the active
> task is unable
> to continue until restoration has complete, so why distinguish between the
> two at all?
> Would it make sense to avoid a special GLOBAL_RESTORING state and just
> introduce
> a single unified RESTORING state to cover both the regular and global case?
> Just a thought
> 
> My only concern is that this might be considered a breaking change: users
> might be
> looking for the REBALANCING -> RUNNING transition specifically in order to
> alert when
> the application has started up, and we would no long go directly from
> REBALANCING to
>  RUNNING. I think we actually did/do this ourselves in a number of
> integration tests and
> possibly in some examples. That said, it seems more appropriate to just
> listen for
> the RUNNING state rather than for a specific transition, and we should
> encourage users
> to do so rather than go out of our way to support transition-type state
> listeners.
> 
> Cheers,
> Sophie
> 
> On Wed, Sep 2, 2020 at 5:53 PM Matthias J. Sax  wrote:
> 
>> I think this makes sense.
>>
>> When we introduce this new state, we might also tackle the jira a
>> mentioned. If there is a global thread, on startup of a `KafakStreams`
>> client we should not transit to `REBALANCING` but to the new state, and
>> maybe also make the "bootstrapping" non-blocking.
>>
>> I guess it's worth to mention this in the KIP.
>>
>> Btw: The new state for KafkaStreams should

Re: [DISCUSS] KIP-663: API to Start and Shut Down Stream Threads and to Request Closing of Kafka Streams Clients

2020-09-03 Thread Bruno Cadonna


Hi Matthias,

I replied inline.

Best,
Bruno

On 02.09.20 22:06, Matthias J. Sax wrote:

Thanks for updating the KIP.

Why do you propose to return `boolean` from addStreamThread() if the
thread could not be started? As an alternative, we could also throw an
exception if the client is not in state RUNNING? -- I guess both are
valid options: just want to see what the pros/cons of each approach
would be?



I prefer to return a boolean because it is nothing exceptional if a 
stream thread cannot be added due to an inappropriate state. State 
changes are expected in Streams. Furthermore, users should not be forced 
to control their program flow by catching exceptions. Let me give you 
some examples for returning a boolean and throwing an exception:


returning a boolean

while (!kafkaStreams.addStreamThread() &&
   kafkaStreams.state() != State.NOT_RUNNING &&
   kafkaStreams.state() != State.ERROR) {
}


throwing an exception

boolean added = false;
while (!added &&
   kafkaStreams.state() != State.NOT_RUNNING &&
   kafkaStreams.state() != State.ERROR) {

try {
kafkaStreams.addStreamThread();
added = true;
} catch (final Exception ex) {
// do nothing
}
}

IMO the first example is more readable than the second.



Btw: should we allow to add a new thread if the state is REBALANCING,
too? I actually don't see a reason why we should not allow this?



I guess you are right. I will update the KIP and include REBALANCING.



For removeStreamThread(), might it be worth to actually guarantee that
the thread with the largest index is stopped instead of leaving if
unspecified? It does not seem to be a big burden on the implementation
and given that we plan to reused indices of died threads, it might be
nice to have a contract? Or would there be any negative impact if we
guarantee it?



I left unspecified which stream thread is removed since I could not find 
any good reason for a guarantee. Also in your comment, I do not see what 
advantage, we would have if we guaranteed that the stream thread with 
the largest index is stopped. It would not guarantee that the next added 
stream thread would get the largest index, because another stream thread 
with a lower index could have failed in the meanwhile and now two 
indices are up for grabs.
Leaving unspecified which stream thread is removed also gives us the 
possibility to choose the stream thread to remove according to other 
aspects like for example the one with the least local state.




Another thought: should we add a parameter `numberOfThreads` to each
method to allow users to start/stop multiple threads at once?



I would keep it simple for now and add overloads if users request them.



What happens if there is zero running threads and one calls
removeStreamThread()? Should we also add a `boolean` flag and return
`false` for this case (or throw an exception)?



Yeah, I think this is a good idea for the programmatical removal of all 
threads. However, I would not throw an exception for the reasons I 
pointed out above.





For the metric name, I would prefer "failed" over "crashed". Thoughts?



I think I like "failed" more than "crashed" and it is also more 
consistent with other parts of the code like the 
ProductionExceptionHandlerResponse.FAIL.





Side remark for the PR: can we make sure to update the description of
`num.stream.threads` to explain that it's the _initial_ number of
threads on startup?



Good point!



-Matthias


On 9/1/20 2:01 PM, Walker Carlson wrote:

Hi Bruno,

I read through your updated KIP and it looks good to me. I agree with
adding the metric to keep track of crashed streams in replace of a list of
dead streams.

best,
Wlaker :)

On Tue, Sep 1, 2020 at 1:05 PM Bruno Cadonna  wrote:


Hi John,

your proposal makes sense! I will update the KIP.

Best,
Bruno

On 01.09.20 17:31, John Roesler wrote:

Hello Bruno,

Thanks for the update! The KIP looks good to me; I only have
a grammatical complaint about the proposed metric name.

"Died" is a verb, the past tense of "to die", but in the
expression,"x stream threads", x should be an adjective. To
be fair, "died" is also the past participle of "to die", and
participles can usually be used as adjectives. Maybe it
sounds wrong to me because there's already a specifically
adjectival form: "dead". So "dead-stream-threads" seems more
natural.

However, I'm not sure if that captures the specific meaning
you're shooting for, namely that the metric counts only the
threads that died exceptionally, vs. from calling
"removeStreamThread()". What do you think of "crashed-
stream-threads" instead?

Thanks,
-John

On Tue, 2020-09-01 at 11:30 +0200, Bruno Cadonna wrote:

Hi,

I updated the KIP with the feedback so far. I removed the API to close
the Kafka Streams client asynchronously, since it should be possible to
avoid the deadlock with the existing method and without a KIP.

Please have a look at the updated KIP and let me know what you think.

[jira] [Created] (KAFKA-10459) Document IQ APIs where order does not hold between stores

2020-09-03 Thread Jorge Esteban Quilcate Otoya (Jira)

Jorge Esteban Quilcate Otoya created KAFKA-10459:


 Summary: Document IQ APIs where order does not hold between stores
 Key: KAFKA-10459
 URL: https://issues.apache.org/jira/browse/KAFKA-10459
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Jorge Esteban Quilcate Otoya


>From [https://github.com/apache/kafka/pull/9138#discussion_r480469688] :
 

This is out of the scope of this PR, but I'd like to point out that the current 
IQ does not actually obey the ordering when there are multiple local stores 
hosted on that instance. For example, if there are two stores from two tasks 
hosting keys \{1, 3} and \{2,4}, then a range query of key [1,4] would return 
in the order of {{1,3,2,4}} but not {{1,2,3,4}} since it is looping over the 
stores only. This would be the case for either forward or backward fetches on 
range-key-range-time.

For single key time range fetch, or course, there's no such issue.

I think it worth documenting this for now until we have a fix (and actually we 
are going to propose something soon).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSSION] Upgrade system tests to python 3

2020-09-03 Thread Nikolay Izhikov

Hello! 

Just a friendly reminder.

Patch to resolve some kind of technical debt - python2 in system tests is ready!
Can someone, please, take a look?

https://github.com/apache/kafka/pull/9196

> 28 авг. 2020 г., в 11:19, Nikolay Izhikov  написал(а):
> 
> Hello!
> 
> Any feedback on this?
> What I should additionally do to prepare system tests migration?
> 
>> 24 авг. 2020 г., в 11:17, Nikolay Izhikov  
>> написал(а):
>> 
>> Hello.
>> 
>> PR [1] is ready.
>> Please, review.
>> 
>> But, I need help with the two following questions:
>> 
>> 1. We need a new release of ducktape which includes fixes [2], [3] for 
>> python3.
>> I created the issue in ducktape repo [4].
>> Can someone help me with the release?
>> 
>> 2. I know that some companies run system tests for the trunk on a regular 
>> bases.
>> Can someone show me some results of these runs?
>> So, I can compare failures in my PR and in the trunk.
>> 
>> Results [5] of run all for my PR available in the ticket [6]
>> 
>> ```
>> SESSION REPORT (ALL TESTS)
>> ducktape version: 0.8.0
>> session_id:   2020-08-23--002
>> run time: 1010 minutes 46.483 seconds
>> tests run:684
>> passed:   505
>> failed:   9
>> ignored:  170
>> ```
>> 
>> [1] https://github.com/apache/kafka/pull/9196
>> [2] 
>> https://github.com/confluentinc/ducktape/commit/23bd5ab53802e3a1e1da1ddf3630934f33b02305
>> [3] 
>> https://github.com/confluentinc/ducktape/commit/bfe53712f83b025832d29a43cde3de3d7803106f
>> [4] https://github.com/confluentinc/ducktape/issues/245
>> [5] https://issues.apache.org/jira/secure/attachment/13010366/report.txt
>> [6] https://issues.apache.org/jira/browse/KAFKA-10402
>> 
>>> 14 авг. 2020 г., в 21:26, Ismael Juma  написал(а):
>>> 
>>> +1
>>> 
>>> On Fri, Aug 14, 2020 at 7:42 AM John Roesler  wrote:
>>> 
 Thanks Nikolay,
 
 No objection. This would be very nice to have.
 
 Thanks,
 John
 
 On Fri, Aug 14, 2020, at 09:18, Nikolay Izhikov wrote:
> Hello.
> 
>> If anyone's interested in porting it to Python 3 it would be a good
 change.
> 
> I’ve created a ticket [1] to upgrade system tests to python3.
> Does someone have any additional inputs or objections for this change?
> 
> [1] https://issues.apache.org/jira/browse/KAFKA-10402
> 
> 
>> 1 июля 2020 г., в 00:26, Gokul Ramanan Subramanian <
 gokul24...@gmail.com> написал(а):
>> 
>> Thanks Colin.
>> 
>> While at the subject of system tests, there are a few times I see tests
>> timed out (even on a large machine such as m5.4xlarge EC2 with Linux).
 Are
>> there any knobs that system tests provide to control timeouts /
 throughputs
>> across all tests?
>> Thanks.
>> 
>> On Tue, Jun 30, 2020 at 6:32 PM Colin McCabe 
 wrote:
>> 
>>> Ducktape runs on Python 2.  You can't use it with Python 3, as you are
>>> trying to do here.
>>> 
>>> If anyone's interested in porting it to Python 3 it would be a good
 change.
>>> 
>>> Otherwise, using docker as suggested here seems to be the best way to
 go.
>>> 
>>> best,
>>> Colin
>>> 
>>> On Mon, Jun 29, 2020, at 02:14, Gokul Ramanan Subramanian wrote:
 Hi.
 
 Has anyone had luck running Kafka system tests on a Mac. I have a
 MacOS
 Mojave 10.14.6. I got Python 3.6.9 using pyenv. However, the command
 *ducktape tests/kafkatest/tests* yields the following error, making
 it
>>> look
 like some Python incompatibility issue.
 
 $ ducktape tests/kafkatest/tests
 Traceback (most recent call last):
 File "/Users/gokusubr/.pyenv/versions/3.6.9/bin/ducktape", line 11,
 in
 
 load_entry_point('ducktape', 'console_scripts', 'ducktape')()
 File
 
>>> 
 "/Users/gokusubr/.pyenv/versions/3.6.9/lib/python3.6/site-packages/pkg_resources/__init__.py",
 line 487, in load_entry_point
 return get_distribution(dist).load_entry_point(group, name)
 File
 
>>> 
 "/Users/gokusubr/.pyenv/versions/3.6.9/lib/python3.6/site-packages/pkg_resources/__init__.py",
 line 2728, in load_entry_point
 return ep.load()
 File
 
>>> 
 "/Users/gokusubr/.pyenv/versions/3.6.9/lib/python3.6/site-packages/pkg_resources/__init__.py",
 line 2346, in load
 return self.resolve()
 File
 
>>> 
 "/Users/gokusubr/.pyenv/versions/3.6.9/lib/python3.6/site-packages/pkg_resources/__init__.py",
 line 2352, in resolve
 module = __import__(self.module_name, fromlist=['__name__'],
 level=0)
 File
 
>>> 
 "/Users/gokusubr/.pyenv/versions/3.6.9/lib/python3.6/site-packages/ducktape-0.7.6-py3.6.egg/ducktape/command_line/main.py",
 line 127
 print "parameters are not valid json: " + str(e.message)

Need contributor access to Kafka Improvement Proposals

2020-09-03 Thread Sandeep Kumar

HI,

Can you please grant me access to create KIP?

Thanks,
Sandeep

Build failed in Jenkins: Kafka » kafka-trunk-jdk8 #42