Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1995

2023-07-12 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 387053 lines...]
Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerOuter[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerOuter[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerLeft[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerLeft[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterInner[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterInner[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterOuter[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testOuterOuter[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithRightVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithRightVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testLeftWithLeftVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testLeftWithLeftVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithLeftVersionedOnly[caching enabled = false] STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TableTableJoinIntegrationTest > [caching enabled = false] > 
testInnerWithLeftVersionedOnly[caching enabled = false] PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskAssignorIntegrationTest > shouldProperlyConfigureTheAssignor STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskAssignorIntegrationTest > shouldProperlyConfigureTheAssignor PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectEndOffsetInformation STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectEndOffsetInformation PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectCommittedOffsetInformation 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
TaskMetadataIntegrationTest > shouldReportCorrectCommittedOffsetInformation 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > shouldReduce(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 182 > 
KStreamAggregationDedupIntegrationTest > 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1994

2023-07-12 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-13295) Long restoration times for new tasks can lead to transaction timeouts

2023-07-12 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-13295.
-
Resolution: Fixed

With the new restore-thread, this issue should be resolved implicilty.

> Long restoration times for new tasks can lead to transaction timeouts
> -
>
> Key: KAFKA-13295
> URL: https://issues.apache.org/jira/browse/KAFKA-13295
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: Sagar Rao
>Priority: Critical
>  Labels: eos, new-streams-runtime-should-fix
>
> In some EOS applications with relatively long restoration times we've noticed 
> a series of ProducerFencedExceptions occurring during/immediately after 
> restoration. The broker logs were able to confirm these were due to 
> transactions timing out.
> In Streams, it turns out we automatically begin a new txn when calling 
> {{send}} (if there isn’t already one in flight). A {{send}} occurs often 
> outside a commit during active processing (eg writing to the changelog), 
> leaving the txn open until the next commit. And if a StreamThread has been 
> actively processing when a rebalance results in a new stateful task without 
> revoking any existing tasks, the thread won’t actually commit this open txn 
> before it goes back into the restoration phase while it builds up state for 
> the new task. So the in-flight transaction is left open during restoration, 
> during which the StreamThread only consumes from the changelog without 
> committing, leaving it vulnerable to timing out when restoration times exceed 
> the configured transaction.timeout.ms for the producer client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15184) New consumer internals refactoring and clean up

2023-07-12 Thread Kirk True (Jira)
Kirk True created KAFKA-15184:
-

 Summary: New consumer internals refactoring and clean up
 Key: KAFKA-15184
 URL: https://issues.apache.org/jira/browse/KAFKA-15184
 Project: Kafka
  Issue Type: Task
  Components: clients, consumer
Reporter: Kirk True
Assignee: Kirk True


Minor refactoring of the new consumer internals including introduction of the 
RequestManagers class to hold the various RequestManager instances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.5 #39

2023-07-12 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 561189 lines...]

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorManyStandbys STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorManyStandbys PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorManyThreadsPerClient STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorManyThreadsPerClient PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorManyThreadsPerClient 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorManyThreadsPerClient 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorLargePartitionCount 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorLargePartitionCount 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorLargePartitionCount STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorLargePartitionCount PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorManyStandbys STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorManyStandbys PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testHighAvailabilityTaskAssignorManyStandbys 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testHighAvailabilityTaskAssignorManyStandbys PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorLargeNumConsumers 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testFallbackPriorTaskAssignorLargeNumConsumers 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorLargeNumConsumers STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StreamsAssignmentScaleTest > testStickyTaskAssignorLargeNumConsumers PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
EmitOnChangeIntegrationTest > shouldEmitSameRecordAfterFailover() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndPersistentStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
HighAvailabilityTaskAssignorIntegrationTest > 
shouldScaleOutWithWarmupTasksAndInMemoryStores(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
KStreamAggregationDedupIntegrationTest > shouldReduce(TestInfo) STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
KStreamAggregationDedupIntegrationTest > shouldReduce(TestInfo) PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
KStreamAggregationDedupIntegrationTest > 

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.4 #149

2023-07-12 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 528149 lines...]

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testReassignPartitionsInProgress() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testReassignPartitionsInProgress() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testChrootExistsAndRootIsLocked() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testChrootExistsAndRootIsLocked() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testCreateTopLevelPaths() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testCreateTopLevelPaths() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetAllTopicsInClusterDoesNotTriggerWatch() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetAllTopicsInClusterDoesNotTriggerWatch() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testIsrChangeNotificationGetters() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testIsrChangeNotificationGetters() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testLogDirEventNotificationsDeletion() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testLogDirEventNotificationsDeletion() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetLogConfigs() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetLogConfigs() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testBrokerSequenceIdMethods() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testBrokerSequenceIdMethods() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testAclMethods() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testAclMethods() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testCreateSequentialPersistentPath() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testCreateSequentialPersistentPath() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testConditionalUpdatePath() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testConditionalUpdatePath() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetAllTopicsInClusterTriggersWatch() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetAllTopicsInClusterTriggersWatch() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testDeleteTopicZNode() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testDeleteTopicZNode() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testDeletePath() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testDeletePath() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetBrokerMethods() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetBrokerMethods() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testJuteMaxBufffer() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testJuteMaxBufffer() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testCreateTokenChangeNotification() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testCreateTokenChangeNotification() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetTopicsAndPartitions() STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testGetTopicsAndPartitions() PASSED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > testChroot(boolean) > 
kafka.zk.KafkaZkClientTest.testChroot(boolean)[1] STARTED

Gradle Test Run :core:integrationTest > Gradle Test Executor 171 > 
KafkaZkClientTest > 

Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-12 Thread Erik van Oosten

Hi Colin, Philip,

I have added a section to KIIP-944 to address your concerns around 
memory consistency over multiple threads.


You can read them here: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-944%3A+Support+async+runtimes+in+consumer#KIP944:Supportasyncruntimesinconsumer-Threadsafety


Kind regards,
    Erik.


Op 12-07-2023 om 13:24 schreef Erik van Oosten:

Thanks Philip,

> I think this can be demonstrated via diagrams and some code in the KIP.

There are some diagrams in KIP-944. How can they be improved?

I will add some code to address the concerns around memory barriers.


> We are in-process of re-writing the KafkaConsumer

Nice! I will read the KIP. Hopefully we don't need complex logic in 
callbacks after the rewrite.


Kind regards,
    Erik.


Op 11-07-2023 om 19:33 schreef Philip Nee:

Hey Erik,

Sorry for holding up this email for a few days since Colin's response
includes some of my concerns.  I'm in favor of this KIP, and I think 
your

approach seems safe.  Of course, I probably missed something therefore I
think this KIP needs to cover different use cases to demonstrate it 
doesn't
cause any unsafe access. I think this can be demonstrated via 
diagrams and

some code in the KIP.

Thanks,
P

On Sat, Jul 8, 2023 at 12:28 PM Erik van Oosten
 wrote:


Hello Colin,

  >> In KIP-944, the callback thread can only delegate to another 
thread

after reading from and writing to a threadlocal variable, providing the
barriers right there.

  > I don't see any documentation that accessing thread local variables
provides a total store or load barrier. Do you have such documentation?
It seems like if this were the case, we could eliminate volatile
variables from most of the code base.

Now I was imprecise. The thread-locals are only somewhat involved. In
the KIP proposal the callback thread reads an access key from a
thread-local variable. It then needs to pass that access key to another
thread, which then can set it on its own thread-local variable. The act
of passing a value from one thread to another implies that a memory
barrier needs to be passed. However, this is all not so relevant since
there is no need to pass the access key back when the other thread 
is done.


But now I think about it a bit more, the locking mechanism runs in a
synchronized block. If I remember correctly this should be enough to
pass read and write barriers.

  >> In the current implementation the consumer is also invoked from
random threads. If it works now, it should continue to work.
  > I'm not sure what you're referring to. Can you expand on this?

Any invocation of the consumer (e.g. method poll) is not from a thread
managed by the consumer. This is what I was assuming you meant with the
term 'random thread'.

  > Hmm, not sure what you mean by "cooperate with blocking code." 
If you
have 10 green threads you're multiplexing on to one CPU thread, and 
that

CPU thread gets blocked because of what one green thread is doing, the
other 9 green threads are blocked too, right? I guess it's "just" a
performance problem, but it still seems like it could be a serious one.

There are several ways to deal with this. All async runtimes I know
(Akka, Zio, Cats-effects) support this by letting you mark a task as
blocking. The runtime will then either schedule it to another
thread-pool, or it will grow the thread-pool to accommodate. In any 
case

'the other 9 green threads' will simply be scheduled to another real
thread. In addition, some of these runtimes detect long running tasks
and will reschedule waiting tasks to another thread. This is all a bit
off topic though.

  > I don't see why this has to be "inherently multi-threaded." Why 
can't
we have the other threads report back what messages they've 
processed to

the worker thread. Then it will be able to handle these callbacks
without involving the other threads.

Please consider the context which is that we are running inside the
callback of the rebalance listener. The only way to execute something
and also have a timeout on it is to run the something on another 
thread.


Kind regards,
  Erik.


Op 08-07-2023 om 19:17 schreef Colin McCabe:

On Sat, Jul 8, 2023, at 02:41, Erik van Oosten wrote:

Hi Colin,

Thanks for your thoughts and taking the time to reply.

Let me take away your concerns. None of your worries are an issue 
with

the algorithm described in KIP-944. Here it goes:

   > It's not clear ot me that it's safe to access the Kafka 
consumer or

producer concurrently from different threads.

Concurrent access is /not/ a design goal of KIP-944. In fact, it goes
through great lengths to make sure that this cannot happen.

*The only design goal is to allow callbacks to call the consumer from
another thread.*

To make sure there are no more misunderstandings about this, I have
added this goal to the KIP.


Hi Erik,

Sorry, I spoke imprecisely. My concern is not concurrent access, but
multithreaded access in general. Basically cache line 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1993

2023-07-12 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15183) Add more controller, loader, snapshot emitter metrics

2023-07-12 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-15183:


 Summary: Add more controller, loader, snapshot emitter metrics
 Key: KAFKA-15183
 URL: https://issues.apache.org/jira/browse/KAFKA-15183
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe


Add the controller, loader, and snapshot emitter metrics from KIP-938.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15182) Normalize offsets before invoking SourceConnector::alterOffsets

2023-07-12 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15182:
--

 Summary: Normalize offsets before invoking 
SourceConnector::alterOffsets
 Key: KAFKA-15182
 URL: https://issues.apache.org/jira/browse/KAFKA-15182
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


See discussion 
[here|https://github.com/apache/kafka/pull/13945#discussion_r1260946148]

 

TLDR: When users attempt to externally modify source connector offsets via the 
{{PATCH /offsets}} endpoint (introduced in 
[KIP-875|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]),
 type mismatches can occur between offsets passed to 
{{SourceConnector::alterOffsets}} and the offsets that are retrieved by 
connectors / tasks via an instance of {{OffsetStorageReader }}after the offsets 
have been modified. In order to prevent this type mismatch that could lead to 
subtle bugs in connectors, we could serialize + deserialize the offsets using 
the worker's internal JSON converter before invoking 
{{{}SourceConnector::alterOffsets{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » 3.5 #38

2023-07-12 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-12 Thread Chris Egerton
Hi Omnia,

Thanks for changing the default, LGTM 

As far as backporting goes, we probably won't be doing another release for
3.1, and possibly not for 3.2 either; however, if we can make the
implementation focused enough (which I don't think would be too difficult,
but correct me if I'm wrong), then we can still backport through 3.1. Even
if we don't do another release it can make life easier for people who are
maintaining parallel forks. Obviously this shouldn't be taken as a blanket
precedent but in this case it seems like the benefits may outweigh the
costs. What are your thoughts?

Cheers,

Chris

On Wed, Jul 12, 2023 at 9:05 AM Omnia Ibrahim 
wrote:

> Hi Chris, thanks for the feedback.
> 1. regarding the default value I had the same conflict of which version to
> break the backward compatibility with. We can just say that this KIP gives
> the release Pre KIP-690 the ability to keep the old behaviour with one
> config and keep the backwards compatibility from post-KIP-690 the same so
> we don't break at least the last 3 versions. I will update the KIP to
> switch the default value to true.
> 2. For the backporting, which versions can we backport these to? Usually,
> Kafka supports bugfix releases as needed for the last 3 releases. Now we @
> 3.5 so the last 3 are 3.4, 3.3 and 3.2 is this correct?
> 3. I'll add a Jira for updating the docs for this KIP so we don't forget
> about it.
>
> Thanks
> Omnia
>
>
> On Mon, Jul 10, 2023 at 5:33 PM Chris Egerton 
> wrote:
>
> > Hi Omnia,
> >
> > Thanks for taking this on! I have some thoughts but the general approach
> > looks good.
> >
> > 1. Default value
> >
> > One thing I'm wrestling with is what the default value of the new
> property
> > should be. I know on the Jira ticket I proposed that it should be false,
> > but I'm having second thoughts. Technically we'd preserve backward
> > compatibility with pre-KIP-690 releases by defaulting to false, but at
> the
> > same time, we'd break compatibility with post-KIP-690 releases. And if we
> > default to true, the opposite would be true: compatibility would be
> broken
> > with pre-KIP-690 releases, but preserved with post-KIP-690 releases.
> >
> > One argument against defaulting to false (which, again, would preserve
> the
> > behavior of MM2 before we accidentally broke compatibility with KIP-690)
> is
> > that this change could possibly cause a single MM2 setup to break
> > twice--once when upgrading from a pre-KIP-690 release to an existing
> > release, and again when upgrading from that existing release to a version
> > that reverted (by default) to pre-KIP-690 behavior. On the other hand, if
> > we default to true (which would preserve the existing behavior that
> breaks
> > compatibility with pre-KIP-690 releases), then any given setup will only
> be
> > broken once.
> >
> > In addition, if we default to true right now, then we don't have to worry
> > about changing that default in 4.0 to a more intuitive value (I hope we
> can
> > all agree that, for new clusters, it makes sense to set this property to
> > true and not to distinguish between internal and non-internal topics).
> >
> > With that in mind, I'm now leaning more towards defaulting to true, but
> > would be interested in your thoughts.
> >
> >
> > 2. Backport?
> >
> > It's highly unlikely to backport changes for a KIP, but given the impact
> of
> > the compatibility break that we're trying to address here, and the
> > extremely low risk of the proposed changes, I think we should consider
> > backporting the proposed fix to all affected release branches (i.e., 3.1
> > through 3.5).
> >
> >
> > 3. Extra steps
> >
> > I also think we can take these additional steps to try to help prevent
> > users from being bitten by this change:
> >
> > - Add a note to our upgrade instructions [1] for all affected versions
> that
> > instructs users on how to safely upgrade to a post-KIP-690 release, for
> > versions that both do and do not include the changes from this KIP
> > - Log a warning message on MM2 startup if the config contains an explicit
> > value for "replication.policy.separator" but does not contain an explicit
> > value for "replication.policy.internal.topic.separator.enabled"
> >
> > These details don't necessarily have to be codified in the KIP, but
> they're
> > worth taking into account when considering how to design any functional
> > changes in order to better try to gauge how well this could go for our
> > users.
> >
> > [1] - https://kafka.apache.org/documentation.html#upgrade
> >
> >
> > Thanks again for the KIP!
> >
> > Cheers,
> >
> > Chris
> >
> > On Fri, Jul 7, 2023 at 10:12 AM Omnia Ibrahim 
> > wrote:
> >
> > > Hi everyone,
> > > I want to start the discussion of the KIP-949. The proposal is here
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy
> > > <
> > >
> >
> 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1992

2023-07-12 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-07-12 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15181:


 Summary: Race condition on partition assigned to 
TopicBasedRemoteLogMetadataManager 
 Key: KAFKA-15181
 URL: https://issues.apache.org/jira/browse/KAFKA-15181
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared whever 
partitions are assigned.

When partitions are assigned to the TBRLMM instance, a consumer is started to 
keep the cache up to date.

If the cache hasn't finalized to build, TBRLMM fails to return remote metadata 
about partitions that are store on the backing topic. TBRLMM may not recover 
from this failing state.

A proposal to fix this issue would be wait after a partition is assigned for 
the consumer to catch up. A similar logic is used at the moment when TBRLMM 
writes to the topic, and uses send callback to wait for consumer to catch up. 
This logic can be reused whever a partition is assigned, so when TBRLMM is 
marked as initialized, cache is ready to serve requests.


Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-949: Add flag to enable the usage of topic separator in MM2 DefaultReplicationPolicy

2023-07-12 Thread Omnia Ibrahim
Hi Chris, thanks for the feedback.
1. regarding the default value I had the same conflict of which version to
break the backward compatibility with. We can just say that this KIP gives
the release Pre KIP-690 the ability to keep the old behaviour with one
config and keep the backwards compatibility from post-KIP-690 the same so
we don't break at least the last 3 versions. I will update the KIP to
switch the default value to true.
2. For the backporting, which versions can we backport these to? Usually,
Kafka supports bugfix releases as needed for the last 3 releases. Now we @
3.5 so the last 3 are 3.4, 3.3 and 3.2 is this correct?
3. I'll add a Jira for updating the docs for this KIP so we don't forget
about it.

Thanks
Omnia


On Mon, Jul 10, 2023 at 5:33 PM Chris Egerton 
wrote:

> Hi Omnia,
>
> Thanks for taking this on! I have some thoughts but the general approach
> looks good.
>
> 1. Default value
>
> One thing I'm wrestling with is what the default value of the new property
> should be. I know on the Jira ticket I proposed that it should be false,
> but I'm having second thoughts. Technically we'd preserve backward
> compatibility with pre-KIP-690 releases by defaulting to false, but at the
> same time, we'd break compatibility with post-KIP-690 releases. And if we
> default to true, the opposite would be true: compatibility would be broken
> with pre-KIP-690 releases, but preserved with post-KIP-690 releases.
>
> One argument against defaulting to false (which, again, would preserve the
> behavior of MM2 before we accidentally broke compatibility with KIP-690) is
> that this change could possibly cause a single MM2 setup to break
> twice--once when upgrading from a pre-KIP-690 release to an existing
> release, and again when upgrading from that existing release to a version
> that reverted (by default) to pre-KIP-690 behavior. On the other hand, if
> we default to true (which would preserve the existing behavior that breaks
> compatibility with pre-KIP-690 releases), then any given setup will only be
> broken once.
>
> In addition, if we default to true right now, then we don't have to worry
> about changing that default in 4.0 to a more intuitive value (I hope we can
> all agree that, for new clusters, it makes sense to set this property to
> true and not to distinguish between internal and non-internal topics).
>
> With that in mind, I'm now leaning more towards defaulting to true, but
> would be interested in your thoughts.
>
>
> 2. Backport?
>
> It's highly unlikely to backport changes for a KIP, but given the impact of
> the compatibility break that we're trying to address here, and the
> extremely low risk of the proposed changes, I think we should consider
> backporting the proposed fix to all affected release branches (i.e., 3.1
> through 3.5).
>
>
> 3. Extra steps
>
> I also think we can take these additional steps to try to help prevent
> users from being bitten by this change:
>
> - Add a note to our upgrade instructions [1] for all affected versions that
> instructs users on how to safely upgrade to a post-KIP-690 release, for
> versions that both do and do not include the changes from this KIP
> - Log a warning message on MM2 startup if the config contains an explicit
> value for "replication.policy.separator" but does not contain an explicit
> value for "replication.policy.internal.topic.separator.enabled"
>
> These details don't necessarily have to be codified in the KIP, but they're
> worth taking into account when considering how to design any functional
> changes in order to better try to gauge how well this could go for our
> users.
>
> [1] - https://kafka.apache.org/documentation.html#upgrade
>
>
> Thanks again for the KIP!
>
> Cheers,
>
> Chris
>
> On Fri, Jul 7, 2023 at 10:12 AM Omnia Ibrahim 
> wrote:
>
> > Hi everyone,
> > I want to start the discussion of the KIP-949. The proposal is here
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-949%3A+Add+flag+to+enable+the+usage+of+topic+separator+in+MM2+DefaultReplicationPolicy
> > >
> >
> > Thanks for your time and feedback.
> > Omnia
> >
>


Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-12 Thread Divij Vaidya
Jorge,
About API name: Good point. I have changed it to remoteLogSize instead of
getRemoteLogSize

About partition tag in the metric: We don't use partition tag across any of
the RemoteStorage metrics and I would like to keep this metric aligned with
the rest. I will change the metric though to type=BrokerTopicMetrics
instead of type=RemoteLogManager, since this is topic level information and
not specific to RemoteLogManager.


Satish,
Ah yes! Updated from "This would increase the broker start-up time." to
"This would increase the bootstrap time for the remote storage thread pool
before the first eligible segment is archived."

--
Divij Vaidya



On Mon, Jul 3, 2023 at 2:07 PM Satish Duggana 
wrote:

> Thanks Divij for taking the feedback and updating the motivation
> section in the KIP.
>
> One more comment on Alternative solution-3, The con is not valid as
> that will not affect the broker restart times as discussed in the
> earlier email in this thread. You may want to update that.
>
> ~Satish.
>
> On Sun, 2 Jul 2023 at 01:03, Divij Vaidya  wrote:
> >
> > Thank you folks for reviewing this KIP.
> >
> > Satish, I have modified the motivation to make it more clear. Now it
> says,
> > "Since the main feature of tiered storage is storing a large amount of
> > data, we expect num_remote_segments to be large. A frequent linear scan
> > (i.e. listing all segment metadata) could be expensive/slower because of
> > the underlying storage used by RemoteLogMetadataManager. This slowness to
> > list all segment metadata could result in the loss of availability"
> >
> > Jun, Kamal, Satish, if you don't have any further concerns, I would
> > appreciate a vote for this KIP in the voting thread -
> > https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> > kamal.chandraprak...@gmail.com> wrote:
> >
> > > Hi Divij,
> > >
> > > Thanks for the explanation. LGTM.
> > >
> > > --
> > > Kamal
> > >
> > > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana <
> satish.dugg...@gmail.com>
> > > wrote:
> > >
> > > > Hi Divij,
> > > > I am fine with having an API to compute the size as I mentioned in my
> > > > earlier reply in this mail thread. But I have the below comment for
> > > > the motivation for this KIP.
> > > >
> > > > As you discussed offline, the main issue here is listing calls for
> > > > remote log segment metadata is slower because of the storage used for
> > > > RLMM. These can be avoided with this new API.
> > > >
> > > > Please add this in the motivation section as it is one of the main
> > > > motivations for the KIP.
> > > >
> > > > Thanks,
> > > > Satish.
> > > >
> > > > On Sat, 1 Jul 2023 at 01:43, Jun Rao 
> wrote:
> > > > >
> > > > > Hi, Divij,
> > > > >
> > > > > Sorry for the late reply.
> > > > >
> > > > > Given your explanation, the new API sounds reasonable to me. Is
> that
> > > > enough
> > > > > to build the external metadata layer for the remote segments or do
> you
> > > > need
> > > > > some additional API changes?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya <
> divijvaidy...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thank you for looking into this Kamal.
> > > > > >
> > > > > > You are right in saying that a cold start (i.e. leadership
> failover
> > > or
> > > > > > broker startup) does not impact the broker startup duration. But
> it
> > > > does
> > > > > > have the following impact:
> > > > > > 1. It leads to a burst of full-scan requests to RLMM in case
> multiple
> > > > > > leadership failovers occur at the same time. Even if the RLMM
> > > > > > implementation has the capability to serve the total size from an
> > > index
> > > > > > (and hence handle this burst), we wouldn't be able to use it
> since
> > > the
> > > > > > current API necessarily calls for a full scan.
> > > > > > 2. The archival (copying of data to tiered storage) process will
> > > have a
> > > > > > delayed start. The delayed start of archival could lead to local
> > > build
> > > > up
> > > > > > of data which may lead to disk full.
> > > > > >
> > > > > > The disadvantage of adding this new API is that every provider
> will
> > > > have to
> > > > > > implement it, agreed. But I believe that this tradeoff is
> worthwhile
> > > > since
> > > > > > the default implementation could be the same as you mentioned,
> i.e.
> > > > keeping
> > > > > > cumulative in-memory count.
> > > > > >
> > > > > > --
> > > > > > Divij Vaidya
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 4, 2023 at 5:48 PM Kamal Chandraprakash <
> > > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Divij,
> > > > > > >
> > > > > > > Thanks for the KIP! Sorry for the late reply.
> > > > > > >
> > > > > > > Can you explain the rejected alternative-3?
> > > > > > > Store the cumulative size of remote tier log in-memory at
> > > > > > 

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.5 #37

2023-07-12 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 561694 lines...]
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
SmokeTestDriverIntegrationTest > shouldWorkWithRebalance(boolean) > 
org.apache.kafka.streams.integration.SmokeTestDriverIntegrationTest.shouldWorkWithRebalance(boolean)[2]
 PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQuerySpecificActivePartitionStores() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQuerySpecificActivePartitionStores() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldFailWithIllegalArgumentExceptionWhenIQPartitionerReturnsMultiplePartitions()
 STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldFailWithIllegalArgumentExceptionWhenIQPartitionerReturnsMultiplePartitions()
 PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQueryAllStalePartitionStores() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQueryAllStalePartitionStores() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreads() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreads() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQuerySpecificStalePartitionStores() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQuerySpecificStalePartitionStores() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQueryOnlyActivePartitionStoresByDefault() 
STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > shouldQueryOnlyActivePartitionStoresByDefault() 
PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldQueryStoresAfterAddingAndRemovingStreamThread() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldQueryStoresAfterAddingAndRemovingStreamThread() PASSED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreadsNamedTopology() STARTED

Gradle Test Run :streams:integrationTest > Gradle Test Executor 177 > 
StoreQueryIntegrationTest > 
shouldQuerySpecificStalePartitionStoresMultiStreamThreadsNamedTopology() PASSED
streams-5: SMOKE-TEST-CLIENT-CLOSED
streams-1: SMOKE-TEST-CLIENT-CLOSED
streams-1: SMOKE-TEST-CLIENT-CLOSED
streams-3: SMOKE-TEST-CLIENT-CLOSED
streams-5: SMOKE-TEST-CLIENT-CLOSED
streams-6: SMOKE-TEST-CLIENT-CLOSED
streams-2: SMOKE-TEST-CLIENT-CLOSED
streams-3: SMOKE-TEST-CLIENT-CLOSED
streams-2: SMOKE-TEST-CLIENT-CLOSED
streams-0: SMOKE-TEST-CLIENT-CLOSED
streams-4: SMOKE-TEST-CLIENT-CLOSED
streams-4: SMOKE-TEST-CLIENT-CLOSED
streams-0: SMOKE-TEST-CLIENT-CLOSED

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings 
and determine if they come from your own scripts or plugins.

See 
https://docs.gradle.org/8.0.2/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 3h 18m 51s
230 actionable tasks: 124 executed, 106 up-to-date

See the profiling report at: 
file:///home/jenkins/workspace/Kafka_kafka_3.5/build/reports/profile/profile-2023-07-12-08-39-02.html
A fine-grained performance profile is available: use the --scan option.
[Pipeline] junit
Recording test results
[Checks API] No suitable checks publisher found.
[Pipeline] echo
Verify that Kafka Streams archetype compiles
[Pipeline] sh
+ ./gradlew streams:publishToMavenLocal clients:publishToMavenLocal 
connect:json:publishToMavenLocal connect:api:publishToMavenLocal
To honour the JVM settings for this build a single-use Daemon process will be 
forked. See 
https://docs.gradle.org/8.0.2/userguide/gradle_daemon.html#sec:disabling_the_daemon.
Daemon will be stopped at the end of the build 

> Configure project :
Starting build with version 3.5.1 (commit id 82ccbe69) using Gradle 8.0.2, Java 
1.8 and Scala 2.13.10
Build properties: maxParallelForks=24, maxScalacThreads=8, 

[jira] [Resolved] (KAFKA-15148) Some integration tests are running as unit tests

2023-07-12 Thread Divij Vaidya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-15148.
--
  Reviewer: Divij Vaidya
Resolution: Fixed

> Some integration tests are running as unit tests
> 
>
> Key: KAFKA-15148
> URL: https://issues.apache.org/jira/browse/KAFKA-15148
> Project: Kafka
>  Issue Type: Test
>  Components: unit tests
>Reporter: Divij Vaidya
>Assignee: Ezio Xie
>Priority: Minor
>  Labels: newbie
>
> *This is a good item for a newcomer into Kafka code base to pick up*
>  
> When we run `./gradlew unitTest`, it is supposed to run all unit tests. 
> However, we are running some integration tests as part of which makes the 
> overall process of running unitTest take longer than expected.
> Example of such tests:
> > :streams:unitTest > Executing test 
> > org.apache...integration.NamedTopologyIntegrationTest
> > :streams:unitTest > Executing test 
> > org.apache...integration.QueryableStateIntegrationTest
> After this task, we should not run the these tests as part of `./gradlew 
> unitTest`, instead they should be run as part of `./gradlew integrationTest`.
> As part of acceptance criteria, please add the snapshot of html summary 
> generated to verify that these tests are indeed running as part of 
> integrationTest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1991

2023-07-12 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-944 Support async runtimes in consumer, votes needed!

2023-07-12 Thread Erik van Oosten

Thanks Philip,

> I think this can be demonstrated via diagrams and some code in the KIP.

There are some diagrams in KIP-944. How can they be improved?

I will add some code to address the concerns around memory barriers.


> We are in-process of re-writing the KafkaConsumer

Nice! I will read the KIP. Hopefully we don't need complex logic in 
callbacks after the rewrite.


Kind regards,
    Erik.


Op 11-07-2023 om 19:33 schreef Philip Nee:

Hey Erik,

Sorry for holding up this email for a few days since Colin's response
includes some of my concerns.  I'm in favor of this KIP, and I think your
approach seems safe.  Of course, I probably missed something therefore I
think this KIP needs to cover different use cases to demonstrate it doesn't
cause any unsafe access. I think this can be demonstrated via diagrams and
some code in the KIP.

Thanks,
P

On Sat, Jul 8, 2023 at 12:28 PM Erik van Oosten
 wrote:


Hello Colin,

  >> In KIP-944, the callback thread can only delegate to another thread
after reading from and writing to a threadlocal variable, providing the
barriers right there.

  > I don't see any documentation that accessing thread local variables
provides a total store or load barrier. Do you have such documentation?
It seems like if this were the case, we could eliminate volatile
variables from most of the code base.

Now I was imprecise. The thread-locals are only somewhat involved. In
the KIP proposal the callback thread reads an access key from a
thread-local variable. It then needs to pass that access key to another
thread, which then can set it on its own thread-local variable. The act
of passing a value from one thread to another implies that a memory
barrier needs to be passed. However, this is all not so relevant since
there is no need to pass the access key back when the other thread is done.

But now I think about it a bit more, the locking mechanism runs in a
synchronized block. If I remember correctly this should be enough to
pass read and write barriers.

  >> In the current implementation the consumer is also invoked from
random threads. If it works now, it should continue to work.
  > I'm not sure what you're referring to. Can you expand on this?

Any invocation of the consumer (e.g. method poll) is not from a thread
managed by the consumer. This is what I was assuming you meant with the
term 'random thread'.

  > Hmm, not sure what you mean by "cooperate with blocking code." If you
have 10 green threads you're multiplexing on to one CPU thread, and that
CPU thread gets blocked because of what one green thread is doing, the
other 9 green threads are blocked too, right? I guess it's "just" a
performance problem, but it still seems like it could be a serious one.

There are several ways to deal with this. All async runtimes I know
(Akka, Zio, Cats-effects) support this by letting you mark a task as
blocking. The runtime will then either schedule it to another
thread-pool, or it will grow the thread-pool to accommodate. In any case
'the other 9 green threads' will simply be scheduled to another real
thread. In addition, some of these runtimes detect long running tasks
and will reschedule waiting tasks to another thread. This is all a bit
off topic though.

  > I don't see why this has to be "inherently multi-threaded." Why can't
we have the other threads report back what messages they've processed to
the worker thread. Then it will be able to handle these callbacks
without involving the other threads.

Please consider the context which is that we are running inside the
callback of the rebalance listener. The only way to execute something
and also have a timeout on it is to run the something on another thread.

Kind regards,
  Erik.


Op 08-07-2023 om 19:17 schreef Colin McCabe:

On Sat, Jul 8, 2023, at 02:41, Erik van Oosten wrote:

Hi Colin,

Thanks for your thoughts and taking the time to reply.

Let me take away your concerns. None of your worries are an issue with
the algorithm described in KIP-944. Here it goes:

   > It's not clear ot me that it's safe to access the Kafka consumer or

producer concurrently from different threads.

Concurrent access is /not/ a design goal of KIP-944. In fact, it goes
through great lengths to make sure that this cannot happen.

*The only design goal is to allow callbacks to call the consumer from
another thread.*

To make sure there are no more misunderstandings about this, I have
added this goal to the KIP.


Hi Erik,

Sorry, I spoke imprecisely. My concern is not concurrent access, but

multithreaded access in general. Basically cache line visibility issues.

   > This is true even if the accesses happen at different times, because

modern CPUs require memory barriers to guarantee inter-thread visibilty
of loads and stores.

In KIP-944, the callback thread can only delegate to another thread
after reading from and writing to a threadlocal variable, providing the
barriers right there.


I don't see any documentation that accessing thread 

Re: [VOTE] 3.5.1 RC0

2023-07-12 Thread Divij Vaidya
+ kafka-clie...@googlegroups.com

--
Divij Vaidya



On Wed, Jul 12, 2023 at 12:03 PM Divij Vaidya 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 3.5.1.
>
> This release is a security patch release. It upgrades the dependency,
> snappy-java, to a version which is not vulnerable to CVE-2023-34455. You
> can find more information about the CVE at Kafka CVE list
> .
>
> Additionally, this releases fixes a regression introduced in 3.3.0, which
> caused security.protocol configuration values to be restricted to upper
> case only. With this release, security.protocol values are
> case insensitive. See KAFKA-15053
>  for details.
>
> Release notes for the 3.5.1 release:
> https://home.apache.org/~divijv/kafka-3.5.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, July 18, 9am PT
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~divijv/kafka-3.5.1-rc0/
>
> Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> Javadoc:
> https://home.apache.org/~divijv/kafka-3.5.1-rc0/javadoc/
>
> Tag to be voted upon (off 3.5 branch) is the 3.5.1 tag:
> https://github.com/apache/kafka/releases/tag/3.5.1-rc0
>
> Documentation:
> https://kafka.apache.org/35/documentation.html
> Please note that documentation will be updated with upgrade notes (
> https://github.com/apache/kafka/commit/4c78fd64454e25e3536e8c7ed5725d3fbe944a49)
> after the release is complete.
>
> Protocol:
> https://kafka.apache.org/35/protocol.html
>
> Unit/integration tests:
> https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/35/ (9
> failures). I am running another couple of runs to ensure that there are no
> consistently failing tests. I have verified that unit/integration tests on
> my local machine successfully pass.
>
> System tests:
> Not planning to run system tests since this is a patch release.
>
> Thank you.
>
> --
> Divij Vaidya
> Release Manager for Apache Kafka 3.5.1
>
>


[VOTE] 3.5.1 RC0

2023-07-12 Thread Divij Vaidya
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.5.1.

This release is a security patch release. It upgrades the dependency,
snappy-java, to a version which is not vulnerable to CVE-2023-34455. You
can find more information about the CVE at Kafka CVE list
.

Additionally, this releases fixes a regression introduced in 3.3.0, which
caused security.protocol configuration values to be restricted to upper
case only. With this release, security.protocol values are
case insensitive. See KAFKA-15053
 for details.

Release notes for the 3.5.1 release:
https://home.apache.org/~divijv/kafka-3.5.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Tuesday, July 18, 9am PT

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

Release artifacts to be voted upon (source and binary):
https://home.apache.org/~divijv/kafka-3.5.1-rc0/

Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

Javadoc:
https://home.apache.org/~divijv/kafka-3.5.1-rc0/javadoc/

Tag to be voted upon (off 3.5 branch) is the 3.5.1 tag:
https://github.com/apache/kafka/releases/tag/3.5.1-rc0

Documentation:
https://kafka.apache.org/35/documentation.html
Please note that documentation will be updated with upgrade notes (
https://github.com/apache/kafka/commit/4c78fd64454e25e3536e8c7ed5725d3fbe944a49)
after the release is complete.

Protocol:
https://kafka.apache.org/35/protocol.html

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/35/ (9 failures).
I am running another couple of runs to ensure that there are no
consistently failing tests. I have verified that unit/integration tests on
my local machine successfully pass.

System tests:
Not planning to run system tests since this is a patch release.

Thank you.

--
Divij Vaidya
Release Manager for Apache Kafka 3.5.1


[GitHub] [kafka-site] mimaison merged pull request #521: KAFKA-14995: Automate asf.yaml collaborators refresh

2023-07-12 Thread via GitHub


mimaison merged PR #521:
URL: https://github.com/apache/kafka-site/pull/521


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org