Re: [DISCUSS] KIP-427: Add AtMinIsr topic partition category (new metric & TopicCommand option)

2019-02-27 Thread Dong Lin
Hey Kevin,

Thanks for the update.

The KIP suggests that AtMinIsr is better than UnderReplicatedPartition as
indicator for alerting. However, in most case where min_isr =
replica_set_size - 1, these two metrics are exactly the same, where planned
maintenance can easily cause positive AtMinIsr value. In the other
scenario, for example let's say min_isr = 1 and replica_set_size = 3, it is
still possible that planned maintenance (e.g. one broker restart +
partition reassignment) can cause isr size drop to 1. Since AtMinIsr can
also cause fault positive (i.e. the fact that AtMinIsr > 0 does not
necessarily need attention from user), I am not sure it is worth to add
this metric.

In the Usage section, it is mentioned that user needs to manually check
whether there is ongoing maintenance after AtMinIsr is triggered. Could you
explain how is this different from the current way where we use
UnderReplicatedPartition to trigger alert? More specifically, can we just
replace AtMinIsr with UnderReplicatedPartition in the Usage section?

Thanks,
Dong


On Tue, Feb 26, 2019 at 6:49 PM Kevin Lu  wrote:

> Hi Dong!
>
> Thanks for the feedback!
>
> You bring up a good point in that the AtMinIsr metric cannot be used to
> identify failure in the mentioned scenarios. I admit the motivation section
> placed too much emphasis on "identifying failure".
>
> I have modified the KIP to reflect the implementation as the AtMinIsr
> metric is intended to serve as a warning as one more failure to a partition
> AtMinIsr will cause producers with acks=ALL configured to fail. It has an
> additional benefit when minIsr=1 as it will warn us that the entire
> partition is at risk of going offline, but that is more of a side effect
> that only applies in that scenario (minIsr=1).
>
> Regards,
> Kevin
>
> On Tue, Feb 26, 2019 at 5:11 PM Dong Lin  wrote:
>
> > Hey Kevin,
> >
> > Thanks for the proposal!
> >
> > It seems that the proposed implementation does not match the motivation.
> > The motivation suggests that the operator wants to tell the planned
> > maintenance (e.g. broker restart) from unplanned failure (e.g. network
> > failure). But the use of the metric AtMinIsr does not really
> differentiate
> > between these causes of the reduced number of ISR. For example, an
> > unplanned failure can cause ISR to drop from 3 to 2 but it can still be
> > higher than the minIsr (say 1). And a planned maintenance can cause ISR
> to
> > drop from 3 to 2, which trigger the AtMinIsr metric if minIsr=2. Can you
> > update the design doc to fix or explain this issue?
> >
> > Thanks,
> > Dong
> >
> > On Tue, Feb 12, 2019 at 9:02 AM Kevin Lu  wrote:
> >
> > > Hi All,
> > >
> > > Getting the discussion thread started for KIP-427 in case anyone is
> free
> > > right now.
> > >
> > > I’d like to propose a new category of topic partitions *AtMinIsr* which
> > are
> > > partitions that only have the minimum number of in sync replicas left
> in
> > > the ISR set (as configured by min.insync.replicas).
> > >
> > > This would add two new metrics *ReplicaManager.AtMinIsrPartitionCount
> *&
> > > *Partition.AtMinIsr*, and a new TopicCommand option*
> > > --at-min-isr-partitions* to help in monitoring and alerting.
> > >
> > > KIP link: KIP-427: Add AtMinIsr topic partition category (new metric &
> > > TopicCommand option)
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103089398
> > > >
> > >
> > > Please take a look and let me know what you think.
> > >
> > > Regards,
> > > Kevin
> > >
> >
>


Re: Speeding up integration tests

2019-02-27 Thread Ismael Juma
It's an idea that has come up before and worth exploring eventually.
However, I'd first try to optimize the server startup/shutdown process. If
we measure where the time is going, maybe some opportunities will present
themselves.

Ismael

On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass 
wrote:

> Hi Folks,
>
> I've been observing lately that unit tests usually take 2.5 hours to run
> and a very big portion of these are the core tests where a new cluster is
> spun up for every test. This takes most of the time. I ran a test
> (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> it shows for instance that running the whole class itself took 10 minutes
> and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> 100% overhead. Without profiler the whole class takes 7 minutes and 48
> seconds, so the useful time would be between 3-4 minutes. This is a bigger
> test though, most of them won't take this much.
> There are 74 classes that implement KafkaServerTestHarness and just running
> :core:integrationTest takes almost 2 hours.
>
> I think we could greatly speed up these integration tests by just creating
> the cluster once per class and perform the tests on separate methods. I
> know that this a little bit contradicts to the principle that tests should
> be independent but it seems like recreating clusters for each is a very
> expensive operation. Also if the tests are acting on different resources
> (different topics, etc.) then it might not hurt their independence. There
> might be cases of course where this is not possible but I think there could
> be a lot where it is.
>
> In the optimal case we could cut the testing time back by approximately an
> hour. This would save resources and give quicker feedback for PR builds.
>
> What are your thoughts?
> Has anyone thought about this or were there any attempts made?
>
> Best,
> Viktor
>


[jira] [Resolved] (KAFKA-7962) StickyAssignor: throws NullPointerException during assignments if topic is deleted

2019-02-27 Thread Vahid Hashemian (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian resolved KAFKA-7962.

   Resolution: Fixed
 Reviewer: Vahid Hashemian
Fix Version/s: 2.3.0

> StickyAssignor: throws NullPointerException during assignments if topic is 
> deleted
> --
>
> Key: KAFKA-7962
> URL: https://issues.apache.org/jira/browse/KAFKA-7962
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
> Environment: 1. MacOS, com.salesforce.kafka.test.KafkaTestUtils (kind 
> of embedded kafka integration tests)
> 2. Linux, dockerised kafka and our service
>Reporter: Oleg Smirnov
>Assignee: huxihx
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: NPE-StickyAssignor-issues.apache.log
>
>
> Integration tests with  com.salesforce.kafka.test.KafkaTestUtils, local 
> setup, StickyAssignor used, local topics are created / removed, one topic is 
> created in the beginning of test and without unsubscribing from it - deleted.
> Same happens in real environment.
>  
>  # have single "topic" with 1 partition
>  # single consumer subscribed to this "topic" (StickyAssignor)
>  # delete "topic"
> =>
>  * rebalance starts, topic partition(s) is revoked
>  * on assignment StickyAssignor throws exception (line 223), because 
> partitionsPerTopic.("topic") returns null in for loop (topic deleted - no 
> partitions are present)
>  
> In the provided log part, tearDown() causes topic deletion, while consumer 
> still running and tries to poll data from topic.
> RangeAssignor works fine (revokes partition, assigns empty set).
> Problem doesn't have workaround (like handle i in onPartitionsAssigned and 
> remove unsubscribe topic), because everything happens before listener called.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8014) Extend Connect integration tests to add and remove workers dynamically

2019-02-27 Thread Konstantine Karantasis (JIRA)
Konstantine Karantasis created KAFKA-8014:
-

 Summary: Extend Connect integration tests to add and remove 
workers dynamically
 Key: KAFKA-8014
 URL: https://issues.apache.org/jira/browse/KAFKA-8014
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis


 

To allow for even more integration tests that can focus on testing Connect 
framework itself, it seems necessary to add the ability to add and remove 
workers from within a test case. 

The suggestion is to extend Connect's integration test harness 
{{EmbeddedConnectCluster}} to include methods to add and remove workers as well 
as return the workers that are online at any given point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-430 - Return Authorized Operations in Describe Responses

2019-02-27 Thread Manikumar
 Hi All,

While implementing KIP-430, we have added supportedOperations() method to
kafka.security.auth.ResourceType public API.
This will be used to maintain supported operations for a resourceType.
Updated the KIP

with the new method details.
Please take a note of this.

Thanks,
Manikumar

On Wed, Feb 20, 2019 at 6:42 PM Rajini Sivaram 
wrote:

> If there are no other concerns or suggestions, I will start vote on this
> KIP later today.
>
> Thanks,
>
> Rajini
>
> On Mon, Feb 18, 2019 at 10:09 AM Rajini Sivaram 
> wrote:
>
> > Hi Magnus,
> >
> > Have your concerns been addressed in the KIP?
> >
> > Thanks,
> >
> > Rajini
> >
> > On Wed, Feb 13, 2019 at 3:33 PM Satish Duggana  >
> > wrote:
> >
> >> Hi Rajini,
> >> That makes sense, thanks for the clarification.
> >>
> >> Satish.
> >>
> >> On Wed, Feb 13, 2019 at 7:30 PM Rajini Sivaram  >
> >> wrote:
> >> >
> >> > Thanks for the reviews!
> >> >
> >> > Hi Satish,
> >> >
> >> > The authorised operations returned will use the same values as the
> >> > operations returned by the existing DescribeAclsResponse. AdminClient
> >> will
> >> > return these using the existing enum AclOperation.
> >> >
> >> > Hi Magnus,
> >> >
> >> > The MetadataResponse contains these two lines:
> >> >
> >> >- Metadata Response => throttle_time_ms [brokers] cluster_id
> >> >controller_id [topic_metadata] [authorized_operations] <== ADDED
> >> >authorized_operations
> >> >- topic_metadata => error_code topic is_internal
> [partition_metadata]
> >> >[authorized_operations]  <== ADDED authorized_operations
> >> >
> >> > The first is for the cluster's authorized operations and the second
> for
> >> > each topic. Did I misunderstand your question? The full set of
> >> operations
> >> > for each resource type is included in the subsection `AdminClient API
> >> > Changes`.
> >> >
> >> > Under `Rejected Alternatives` I have included addition of a separate
> >> > request to get this information rather than extend an existing one.
> The
> >> > rationale for including all the information in one request is to
> enable
> >> > clients to get all relevant metadata using a single API rather than
> >> have to
> >> > send multiple requests, get responses and combine the two while
> >> resource or
> >> > ACLs may have changed in between. It seems neater to use a single API
> >> since
> >> > a user getting authorized operations is almost definitely going to do
> a
> >> > Describe first and access control for both of these is controlled
> using
> >> > Describe access. If we add new resource types with a corresponding
> >> > Describe, we would simply need to add `authorized_operations` for
> their
> >> > Describe.
> >> >
> >> > Hi Manikumar,
> >> >
> >> > Added IdempotentWrite for Cluster, thanks for pointing that out! I was
> >> > thinking that if authorizer is not configured, we could return all
> >> > supported operations since the user can perform all operations. Added
> a
> >> > note to the KIP.
> >> >
> >> > Regards,
> >> >
> >> > Rajini
> >> >
> >> >
> >> >
> >> > On Wed, Feb 13, 2019 at 11:07 AM Manikumar  >
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Thanks for the KIP.
> >> > >
> >> > > 1. Can't we include IdempotentWrite/ClusterResource Operations for
> >> Cluster
> >> > > resource.
> >> > > 2. What will be the API behaviour when the authorizer is not
> >> configured?. I
> >> > > assume we return empty list.
> >> > >
> >> > > Thanks,
> >> > > Manikumar
> >> > >
> >> > > On Wed, Feb 13, 2019 at 12:33 AM Rajini Sivaram <
> >> rajinisiva...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > I have created a KIP to optionally request authorised operations
> on
> >> > > > resources when describing resources:
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-430+-+Return+Authorized+Operations+in+Describe+Responses
> >> > > >
> >> > > > This includes only information that users with Describe access can
> >> obtain
> >> > > > using other means and hence is consistent with our security model.
> >> It is
> >> > > > intended to made it easier for clients to obtain this information.
> >> > > >
> >> > > > Feedback and suggestions welcome.
> >> > > >
> >> > > > Thank you,
> >> > > >
> >> > > > Rajini
> >> > > >
> >> > >
> >>
> >
>


Build failed in Jenkins: kafka-trunk-jdk8 #3422

2019-02-27 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] Address flakiness of 
CustomQuotaCallbackTest#testCustomQuotaCallback

--
[...truncated 4.63 MB...]
kafka.zk.KafkaZkClientTest > testControllerManagementMethods STARTED

kafka.zk.KafkaZkClientTest > testControllerManagementMethods PASSED

kafka.zk.KafkaZkClientTest > testTopicAssignmentMethods STARTED

kafka.zk.KafkaZkClientTest > testTopicAssignmentMethods PASSED

kafka.zk.KafkaZkClientTest > testPropagateIsrChanges STARTED

kafka.zk.KafkaZkClientTest > testPropagateIsrChanges PASSED

kafka.zk.KafkaZkClientTest > testControllerEpochMethods STARTED

kafka.zk.KafkaZkClientTest > testControllerEpochMethods PASSED

kafka.zk.KafkaZkClientTest > testDeleteRecursive STARTED

kafka.zk.KafkaZkClientTest > testDeleteRecursive PASSED

kafka.zk.KafkaZkClientTest > testGetTopicPartitionStates STARTED

kafka.zk.KafkaZkClientTest > testGetTopicPartitionStates PASSED

kafka.zk.KafkaZkClientTest > testCreateConfigChangeNotification STARTED

kafka.zk.KafkaZkClientTest > testCreateConfigChangeNotification PASSED

kafka.zk.KafkaZkClientTest > testDelegationTokenMethods STARTED

kafka.zk.KafkaZkClientTest > testDelegationTokenMethods PASSED

kafka.zk.ReassignPartitionsZNodeTest > testDecodeInvalidJson STARTED

kafka.zk.ReassignPartitionsZNodeTest > testDecodeInvalidJson PASSED

kafka.zk.ReassignPartitionsZNodeTest > testEncode STARTED

kafka.zk.ReassignPartitionsZNodeTest > testEncode PASSED

kafka.zk.ReassignPartitionsZNodeTest > testDecodeValidJson STARTED

kafka.zk.ReassignPartitionsZNodeTest > testDecodeValidJson PASSED

kafka.zk.LiteralAclStoreTest > shouldHaveCorrectPaths STARTED

kafka.zk.LiteralAclStoreTest > shouldHaveCorrectPaths PASSED

kafka.zk.LiteralAclStoreTest > shouldRoundTripChangeNode STARTED

kafka.zk.LiteralAclStoreTest > shouldRoundTripChangeNode PASSED

kafka.zk.LiteralAclStoreTest > shouldThrowFromEncodeOnNoneLiteral STARTED

kafka.zk.LiteralAclStoreTest > shouldThrowFromEncodeOnNoneLiteral PASSED

kafka.zk.LiteralAclStoreTest > shouldWriteChangesToTheWritePath STARTED

kafka.zk.LiteralAclStoreTest > shouldWriteChangesToTheWritePath PASSED

kafka.zk.LiteralAclStoreTest > shouldHaveCorrectPatternType STARTED

kafka.zk.LiteralAclStoreTest > shouldHaveCorrectPatternType PASSED

kafka.zk.LiteralAclStoreTest > shouldDecodeResourceUsingTwoPartLogic STARTED

kafka.zk.LiteralAclStoreTest > shouldDecodeResourceUsingTwoPartLogic PASSED

kafka.cluster.BrokerEndPointTest > testEndpointFromUri STARTED

kafka.cluster.BrokerEndPointTest > testEndpointFromUri PASSED

kafka.cluster.BrokerEndPointTest > testHashAndEquals STARTED

kafka.cluster.BrokerEndPointTest > testHashAndEquals PASSED

kafka.cluster.BrokerEndPointTest > testFromJsonV4WithNoRack STARTED

kafka.cluster.BrokerEndPointTest > testFromJsonV4WithNoRack PASSED

kafka.cluster.BrokerEndPointTest > testFromJsonFutureVersion STARTED

kafka.cluster.BrokerEndPointTest > testFromJsonFutureVersion PASSED

kafka.cluster.BrokerEndPointTest > testFromJsonV4WithNullRack STARTED

kafka.cluster.BrokerEndPointTest > testFromJsonV4WithNullRack PASSED

kafka.cluster.BrokerEndPointTest > testBrokerEndpointFromUri STARTED

kafka.cluster.BrokerEndPointTest > testBrokerEndpointFromUri PASSED

kafka.cluster.BrokerEndPointTest > testFromJsonV1 STARTED

kafka.cluster.BrokerEndPointTest > testFromJsonV1 PASSED

kafka.cluster.BrokerEndPointTest > testFromJsonV2 STARTED

kafka.cluster.BrokerEndPointTest > testFromJsonV2 PASSED

kafka.cluster.BrokerEndPointTest > testFromJsonV3 STARTED

kafka.cluster.BrokerEndPointTest > testFromJsonV3 PASSED

kafka.cluster.PartitionTest > 
testMakeLeaderDoesNotUpdateEpochCacheForOldFormats STARTED

kafka.cluster.PartitionTest > 
testMakeLeaderDoesNotUpdateEpochCacheForOldFormats PASSED

kafka.cluster.PartitionTest > testReadRecordEpochValidationForLeader STARTED

kafka.cluster.PartitionTest > testReadRecordEpochValidationForLeader PASSED

kafka.cluster.PartitionTest > 
testFetchOffsetForTimestampEpochValidationForFollower STARTED

kafka.cluster.PartitionTest > 
testFetchOffsetForTimestampEpochValidationForFollower PASSED

kafka.cluster.PartitionTest > testListOffsetIsolationLevels STARTED

kafka.cluster.PartitionTest > testListOffsetIsolationLevels PASSED

kafka.cluster.PartitionTest > testAppendRecordsAsFollowerBelowLogStartOffset 
STARTED

kafka.cluster.PartitionTest > testAppendRecordsAsFollowerBelowLogStartOffset 
PASSED

kafka.cluster.PartitionTest > testFetchLatestOffsetIncludesLeaderEpoch STARTED

kafka.cluster.PartitionTest > testFetchLatestOffsetIncludesLeaderEpoch PASSED

kafka.cluster.PartitionTest > testFetchOffsetSnapshotEpochValidationForFollower 
STARTED

kafka.cluster.PartitionTest > testFetchOffsetSnapshotEpochValidationForFollower 
PASSED

kafka.cluster.PartitionTest > testMonotonicOffsetsAfterLeaderChange STARTED

kafka.cluster.PartitionTest > 

[jira] [Created] (KAFKA-8013) Avoid buffer underflow when reading a Struct from a partially correct buffer

2019-02-27 Thread Konstantine Karantasis (JIRA)
Konstantine Karantasis created KAFKA-8013:
-

 Summary: Avoid buffer underflow when reading a Struct from a 
partially correct buffer
 Key: KAFKA-8013
 URL: https://issues.apache.org/jira/browse/KAFKA-8013
 Project: Kafka
  Issue Type: Bug
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis
 Fix For: 2.3.0


Protocol compatibility can be facilitated if a {{Struct}}, that has been 
defined as an extension of a previous {{Struct}} by adding fields at the end of 
the older version, can read an older version by ignoring the absence of the 
missing new fields. Of course this has to be allowed by the definition of these 
fields (they have to be {{nullable}}). 

For example, this should work: 
{code:java}
Schema oldSchema = new Schema(new Field("field1", Type.NULLABLE_STRING));
Schema newSchema = new Schema(new Field("field1", Type.NULLABLE_STRING), new 
Field("field2" , Type.NULLABLE_STRING));
String value = "foo bar baz";
Struct oldFormat = new Struct(oldSchema).set("field1", value);
ByteBuffer buffer = ByteBuffer.allocate(oldSchema.sizeOf(oldFormat));
oldFormat.writeTo(buffer);
buffer.flip();
Struct newFormat = newSchema.read(buffer);
assertEquals(value, newFormat.get("field1"));
assertEquals(null, newFormat.get("field2"));
{code}
Currently it does not. 

A fix to the above is considered safe, because depending on buffer underflow to 
detect missing data at the end of a {{Struct}} is not an appropriate check. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7936) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7936.

Resolution: Duplicate

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7936
> URL: https://issues.apache.org/jira/browse/KAFKA-7936
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, unit tests
>Reporter: Matthias J. Sax
>Priority: Major
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3393/tests]
> {quote}java.util.concurrent.ExecutionException: Boxed Error
>  at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
>  at 
> scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
>  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>  at scala.concurrent.Promise.complete(Promise.scala:53)
>  at scala.concurrent.Promise.complete$(Promise.scala:52)
>  at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
>  at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.AssertionError: Received 0, expected at least 68
>  at org.junit.Assert.fail(Assert.java:89)
>  at org.junit.Assert.assertTrue(Assert.java:42)
>  at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:562)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$5(ConsumerBounceTest.scala:347)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
>  at scala.util.Success.$anonfun$map$1(Try.scala:255)
>  at scala.util.Success.map(Try.scala:213)
>  at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>  at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>  ... 9 more
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk11 #323

2019-02-27 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] Address flakiness of 
CustomQuotaCallbackTest#testCustomQuotaCallback

--
[...truncated 2.31 MB...]

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testReplaceVariableWithTTL PASSED

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testReplaceVariableWithTTLAndScheduleRestart STARTED

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testReplaceVariableWithTTLAndScheduleRestart PASSED

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testReplaceVariableWithTTLFirstCancelThenScheduleRestart STARTED

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testReplaceVariableWithTTLFirstCancelThenScheduleRestart PASSED

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testTransformNullConfiguration STARTED

org.apache.kafka.connect.runtime.WorkerConfigTransformerTest > 
testTransformNullConfiguration PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitFailure 
STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitFailure 
PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitSuccessFollowedByFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitSuccessFollowedByFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testRewindOnRebalanceDuringPoll STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testRewindOnRebalanceDuringPoll PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedCustomValidation STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedCustomValidation PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorAlreadyExists STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorAlreadyExists PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTask STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartTask PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testAccessors STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testAccessors PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedBasicValidation STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedBasicValidation PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testPutConnectorConfig STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testPutConnectorConfig PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance PASSED


Re: [DISCUSS] KIP-433: Provide client API version to authorizer

2019-02-27 Thread Harsha
HI Colin,
Overlooked the IDEMPOTENT_WRITE ACL. This along with 
client.min.version should solve the cases proposed in the KIP.
Can we turn this KIP into adding min.client.version config to broker and it 
could be part of the dynamic config .

Thanks,
Harsha

On Wed, Feb 27, 2019, at 12:17 PM, Colin McCabe wrote:
> On Tue, Feb 26, 2019, at 16:33, Harsha wrote:
> > Hi Colin,
> >   
> > "> I think Ismael and Gwen here bring up a good point.  The version of the 
> > > request is a technical detail that isn't really related to 
> > > authorization.  There are a lot of other technical details like this 
> > > like the size of the request, the protocol it came in on, etc.  None of 
> > > them are passed to the authorizer-- they all have configuration knobs 
> > > to control how we handle them.  If we add this technical detail, 
> > > logically we'll have to start adding all the others, and the authorizer 
> > > API will get really bloated.  It's better to keep it focused on 
> > > authorization, I think."
> > 
> > probably my previous email is not clear but I am agreeing with Gwen's 
> > point. 
> > I am not in favor of extending authorizer to support this.
> > 
> > 
> > "> Another thing to consider is that if we add a new broker configuration 
> > > that lets us set a minimum client version which is allowed, that could 
> > > be useful to other users as well.  On the other hand, most users are 
> > > not likely to write a custom authorizer to try to take advantage of 
> > > version information being passed to the authorizer.  So, I think using> a 
> > > configuration is clearly the better way to go here.  Perhaps it can 
> > > be a KIP-226 dynamic configuration to make this easier to deploy?"
> > 
> > Although minimum client version might help to a certain extent there 
> > are other cases where we want users to not start using transactions for 
> > example. My proposal in the previous thread was to introduce another 
> > module/interface, let's say
> > "SupportedAPIs" which will take in dynamic configuration to check which 
> > APIs are allowed. 
> > It can throw UnsupportedException just like we are throwing 
> > Authorization Exception.
> 
> Hi Harsha,
> 
> We can already prevent people from using transactions using ACLs, 
> right?  That's what the IDEMPOTENT_WRITE ACL was added for.
> 
> In general, I think users should be able to think of ACLs in terms of 
> "what can I do" rather than "how is it implemented."  For example, 
> maybe some day we will replace FetchRequest with GetStuffRequest.  But 
> users who have READ permission on a topic shouldn't have to change 
> anything.  So I think the Authorizer interface should not be aware of 
> individual RPC types or message versions.
> 
> best,
> Colin
> 
> 
> > 
> > 
> > Thanks,
> > Harsha
> > 
> > 
> > n Tue, Feb 26, 2019, at 10:04 AM, Colin McCabe wrote:
> > > Hi Harsha,
> > > 
> > > I think Ismael and Gwen here bring up a good point.  The version of the 
> > > request is a technical detail that isn't really related to 
> > > authorization.  There are a lot of other technical details like this 
> > > like the size of the request, the protocol it came in on, etc.  None of 
> > > them are passed to the authorizer-- they all have configuration knobs 
> > > to control how we handle them.  If we add this technical detail, 
> > > logically we'll have to start adding all the others, and the authorizer 
> > > API will get really bloated.  It's better to keep it focused on 
> > > authorization, I think.
> > > 
> > > Another thing to consider is that if we add a new broker configuration 
> > > that lets us set a minimum client version which is allowed, that could 
> > > be useful to other users as well.  On the other hand, most users are 
> > > not likely to write a custom authorizer to try to take advantage of 
> > > version information being passed to the authorizer.  So, I think  using 
> > > a configuration is clearly the better way to go here.  Perhaps it can 
> > > be a KIP-226 dynamic configuration to make this easier to deploy?
> > > 
> > > cheers,
> > > Colin
> > > 
> > > 
> > > On Mon, Feb 25, 2019, at 15:43, Harsha wrote:
> > > > Hi Ying,
> > > > I think the question is can we add a module in the core which 
> > > > can take up the dynamic config and does a block certain APIs.  This 
> > > > module will be called in each of the APIs like the authorizer does 
> > > > today to check if the API is supported for the client. 
> > > > Instead of throwing AuthorizationException like the authorizer does 
> > > > today it can throw UnsupportedException.
> > > > Benefits are,  we are keeping the authorizer interface as is and adding 
> > > > the flexibility based on dynamic configs without the need for 
> > > > categorizing broker APIs and it will be easy to extend to do additional 
> > > > options,  like turning off certain features which might be in interest 
> > > > to the service providers.
> > > > One drawback,  It will introduce 

Re: [VOTE] KIP-307: Allow to define custom processor names with KStreams DSL

2019-02-27 Thread Bill Bejeck
Florian,

So we have re-vote and KIP-307 is excepted with the changes.  I've looked in 
GitHub and I don't see the 5 separate PRs you mentioned on this thread.  Can 
you push those PRs and ping us when they are ready for review? 

Also, this is an important KIP we'd like to get in sooner rather than later, so 
if you find the work is too much of a burden, we can finish up the last bits to 
get this KIP merged, just let us know.

Thanks again for all the work you've put into this.

-Bill  


On 2019/02/27 02:18:20, Guozhang Wang  wrote: 
> I think all the open questions have been addressed and updated in the
> DISCUSS thread, and thus I'm +1 on this KIP now.
> 
> 
> Guozhang
> 
> On Tue, Feb 26, 2019 at 6:01 PM Matthias J. Sax 
> wrote:
> 
> > +1 (binding)
> >
> >
> > -Matthias
> >
> > On 1/25/19 9:49 AM, Guozhang Wang wrote:
> > > Fair enough; could you summarize the open questions on the DISCUSS thread
> > > then?
> > >
> > > On Fri, Jan 25, 2019 at 9:46 AM Matthias J. Sax 
> > > wrote:
> > >
> > >> Guozhang,
> > >>
> > >> KIP deadline passed yesterday. The last comments on the DISCUSS thread
> > >> are not addressed yet. We need to move this to 2.3.
> > >>
> > >> -Matthias
> > >>
> > >> On 1/24/19 4:23 PM, Guozhang Wang wrote:
> > >>> Hello folks,
> > >>>
> > >>> I'd like to call out for another time on voting for this KIP, given the
> > >>> deadline is approaching now.
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>> On Thu, Jan 17, 2019 at 3:11 PM John Roesler 
> > wrote:
> > >>>
> >  FWIW, I'm also +1 on the consistency changes to Joined.
> > 
> >  -John
> > 
> >  On Thu, Jan 17, 2019 at 5:04 PM Guozhang Wang 
> > >> wrote:
> > 
> > > Hello Florian,
> > >
> > > Thanks for the writeup! I made a pass on the wiki page itself and
> > left
> > >> a
> > > comment on it regarding the newly added APIs. Could you take a look?
> > If
> > > that makes sense could you update the wiki page to make sure we have
> > >> all
> > > the relevant public API changes.
> > >
> > >
> > > Guozhang
> > >
> > > On Thu, Jan 17, 2019 at 1:39 PM John Roesler 
> > >> wrote:
> > >
> > >> Yes, thanks Florian!
> > >>
> > >> +1 (nonbinding) from me.
> > >>
> > >> -John
> > >>
> > >> On Thu, Jan 17, 2019 at 3:36 PM Bill Bejeck 
> > >> wrote:
> > >>
> > >>> Thanks for the KIP Florian, it will be a useful addition.
> > >>>
> > >>> +1 on the KIP for me.
> > >>>
> > >>> -Bill
> > >>>
> > >>> On Thu, Jan 17, 2019 at 4:31 PM Florian Hussonnois <
> > >> fhussonn...@gmail.com>
> > >>> wrote:
> > >>>
> >  Hi folks,
> > 
> >  I would like to initiate a vote for the following KIP :
> > 
> > 
> > >>>
> > >>
> > >
> > 
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
> > 
> >  Note, there is still some minor discussions regarding the
> > >> implementation.
> > 
> >  Thanks
> >  --
> >  Florian HUSSONNOIS
> > 
> > >>>
> > >>
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> > 
> > >>>
> > >>>
> > >>
> > >>
> > >
> >
> >
> 
> -- 
> -- Guozhang
> 


Re: [DISCUSS] KIP-426: Persist Broker Id to Zookeeper

2019-02-27 Thread Harsha
Hi Colin,
  What we want to is to preserve the broker.id so that we can do an 
offline rebuild of a broker. In our cases going through online Kafka 
replication to bring up, a failed node will put producer latencies at risk 
given the new broker will put all the other leaders busy with its replication 
requests. For an offline rebuild, we do not need to do rebalance as long as we 
can recover the broker.id
  Overall, irrespective of this use case we still want an ability to 
retrieve a broker.id for an existing host. This will make swapping in new hosts 
with failed hosts by keeping the existing hostname easier.

Thanks,
Harsha
On Wed, Feb 27, 2019, at 11:53 AM, Colin McCabe wrote:
> Hi Li,
> 
>  > The mechanism simplifies deployment because the same configuration can be 
>  > used across all brokers, however, in a large system where disk failure is 
>  > a norm, the meta file could often get lost, causing a new broker id being 
>  > allocated. This is problematic because new broker id has no partition 
>  > assigned to it so it can’t do anything, while partitions assigned to the 
>  > old one lose one replica
> 
> If all of the disks have failed, then the partitions will lose their 
> replicas no matter what, right?  If any of the disks is still around, 
> then there will be a meta file on the disk which contains the previous 
> broker ID.  So I'm not sure that we need to change anything here.
> 
> best,
> Colin
> 
> 
> On Tue, Feb 5, 2019, at 14:38, Li Kan wrote:
> > Hi, I have KIP-426, which is a small change on automatically determining
> > broker id when starting up. I am new to Kafka so there are a bunch of
> > design trade-offs that I might be missing or hard to decide, so I'd like to
> > get some suggestions on it. I'd expect (and open) to modify (or even
> > totally rewrite) the KIP based on suggestions. Thanks.
> > 
> > -- 
> > Best,
> > Kan
> >
>


Re: Speeding up integration tests

2019-02-27 Thread Ron Dagostino
Hi Colin.

<< wrote:

> On Wed, Feb 27, 2019, at 10:02, Ron Dagostino wrote:
> > Hi everyone.  Maybe providing the option to run it both ways -- start
> your
> > own cluster vs. using one that is pre-started -- might be useful?  Don't
> > know how it would work or if it would be useful, but it is something to
> > think about.
> >
> > Also, while the argument against using a pre-started cluster due to an
> > expected increase in test flakiness is reasonable, it is conjecture and
> may
> > not turn out to be correct; if it isn't too much trouble it might be
> worth
> > it to actually see if the arguments is right since the decreased time
> > benefit is less conjecture/more real.
>
> It's not just a conjecture.  We tried sharing the same cluster for
> multiple test cases in Hadoop.  It often led to difficult to debug tests.
> A test can change the cluster state in a subtle way that causes a following
> test to fail.  The ordering that tests get run in JUnit is also random,
> which makes this even more frustrating for developers.
>
> An easier solution to the problem of long test run times is to run only
> the tests that are affected by a particular change.  For example, if you
> make a change to Connect, you shouldn't need to re-run the tests for
> Clients, since your change doesn't impact any of that code.  We had this
> set up in Hadoop, and it helped free up a lot of Jenkins time.
>
> best,
> Colin
>
> >
> > Ron
> >
> > On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
> >  wrote:
> >
> > > Hi,
> > >
> > > while I am also extremely annoyed at times by the amount of coffee I
> > > have to drink before tests finish I think the argument about flaky
> > > tests is valid! The current setup has the benefit that every test case
> > > runs on a pristine cluster, if we changed this we'd need to go through
> > > all tests and ensure that topic names are different, which can
> > > probably be abstracted to include a timestamp in the name or something
> > > like that, but it is an additional failure potential.
> > > Add to this the fact that "JUnit runs tests using a deterministic, but
> > > unpredictable order" and the water gets even muddier. Potentially this
> > > might mean that adding an additional test case changes the order that
> > > existing test cases are executed in which might mean that all of a
> > > sudden something breaks that you didn't even touch.
> > >
> > > Best regards,
> > > Sönke
> > >
> > >
> > > On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
> > >  wrote:
> > > >
> > > > Hey Viktor,
> > > >
> > > > I am all up for the idea of speeding up the tests. Running the
> > > > `:core:integrationTest` command takes an absurd amount of time as is
> and
> > > is
> > > > continuously going to go up if we don't do anything about it.
> > > > Having said that, I am very scared that your proposal might
> significantly
> > > > increase the test flakiness of current and future tests - test
> flakiness
> > > is
> > > > a huge problem we're battling. We don't get green PR builds too
> often -
> > > it
> > > > is very common that one or two flaky tests fail in each PR.
> > > > We have also found it hard to get a green build for the 2.2 release (
> > > > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> > > >
> > > > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com> wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I've been observing lately that unit tests usually take 2.5 hours
> to
> > > run
> > > > > and a very big portion of these are the core tests where a new
> cluster
> > > is
> > > > > spun up for every test. This takes most of the time. I ran a test
> > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> profiler
> > > and
> > > > > it shows for instance that running the whole class itself took 10
> > > minutes
> > > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> That's a
> > > > > 100% overhead. Without profiler the whole class takes 7 minutes
> and 48
> > > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > bigger
> > > > > test though, most of them won't take this much.
> > > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > running
> > > > > :core:integrationTest takes almost 2 hours.
> > > > >
> > > > > I think we could greatly speed up these integration tests by just
> > > creating
> > > > > the cluster once per class and perform the tests on separate
> methods. I
> > > > > know that this a little bit contradicts to the principle that tests
> > > should
> > > > > be independent but it seems like recreating clusters for each is a
> very
> > > > > expensive operation. Also if the tests are acting on different
> > > resources
> > > > > (different topics, etc.) then it might not hurt their independence.
> > > There
> > > > > might be cases of course where this is not possible but I think
> there
> > > could
> > > > > be a lot where it is.
> > > > >
> > > 

Re: [DISCUSS] KIP-433: Provide client API version to authorizer

2019-02-27 Thread Colin McCabe
On Tue, Feb 26, 2019, at 16:33, Harsha wrote:
> Hi Colin,
>   
> "> I think Ismael and Gwen here bring up a good point.  The version of the 
> > request is a technical detail that isn't really related to 
> > authorization.  There are a lot of other technical details like this 
> > like the size of the request, the protocol it came in on, etc.  None of 
> > them are passed to the authorizer-- they all have configuration knobs 
> > to control how we handle them.  If we add this technical detail, 
> > logically we'll have to start adding all the others, and the authorizer 
> > API will get really bloated.  It's better to keep it focused on 
> > authorization, I think."
> 
> probably my previous email is not clear but I am agreeing with Gwen's point. 
> I am not in favor of extending authorizer to support this.
> 
> 
> "> Another thing to consider is that if we add a new broker configuration 
> > that lets us set a minimum client version which is allowed, that could 
> > be useful to other users as well.  On the other hand, most users are 
> > not likely to write a custom authorizer to try to take advantage of 
> > version information being passed to the authorizer.  So, I think using> a 
> > configuration is clearly the better way to go here.  Perhaps it can 
> > be a KIP-226 dynamic configuration to make this easier to deploy?"
> 
> Although minimum client version might help to a certain extent there 
> are other cases where we want users to not start using transactions for 
> example. My proposal in the previous thread was to introduce another 
> module/interface, let's say
> "SupportedAPIs" which will take in dynamic configuration to check which 
> APIs are allowed. 
> It can throw UnsupportedException just like we are throwing 
> Authorization Exception.

Hi Harsha,

We can already prevent people from using transactions using ACLs, right?  
That's what the IDEMPOTENT_WRITE ACL was added for.

In general, I think users should be able to think of ACLs in terms of "what can 
I do" rather than "how is it implemented."  For example, maybe some day we will 
replace FetchRequest with GetStuffRequest.  But users who have READ permission 
on a topic shouldn't have to change anything.  So I think the Authorizer 
interface should not be aware of individual RPC types or message versions.

best,
Colin


> 
> 
> Thanks,
> Harsha
> 
> 
> n Tue, Feb 26, 2019, at 10:04 AM, Colin McCabe wrote:
> > Hi Harsha,
> > 
> > I think Ismael and Gwen here bring up a good point.  The version of the 
> > request is a technical detail that isn't really related to 
> > authorization.  There are a lot of other technical details like this 
> > like the size of the request, the protocol it came in on, etc.  None of 
> > them are passed to the authorizer-- they all have configuration knobs 
> > to control how we handle them.  If we add this technical detail, 
> > logically we'll have to start adding all the others, and the authorizer 
> > API will get really bloated.  It's better to keep it focused on 
> > authorization, I think.
> > 
> > Another thing to consider is that if we add a new broker configuration 
> > that lets us set a minimum client version which is allowed, that could 
> > be useful to other users as well.  On the other hand, most users are 
> > not likely to write a custom authorizer to try to take advantage of 
> > version information being passed to the authorizer.  So, I think  using 
> > a configuration is clearly the better way to go here.  Perhaps it can 
> > be a KIP-226 dynamic configuration to make this easier to deploy?
> > 
> > cheers,
> > Colin
> > 
> > 
> > On Mon, Feb 25, 2019, at 15:43, Harsha wrote:
> > > Hi Ying,
> > > I think the question is can we add a module in the core which 
> > > can take up the dynamic config and does a block certain APIs.  This 
> > > module will be called in each of the APIs like the authorizer does 
> > > today to check if the API is supported for the client. 
> > > Instead of throwing AuthorizationException like the authorizer does 
> > > today it can throw UnsupportedException.
> > > Benefits are,  we are keeping the authorizer interface as is and adding 
> > > the flexibility based on dynamic configs without the need for 
> > > categorizing broker APIs and it will be easy to extend to do additional 
> > > options,  like turning off certain features which might be in interest 
> > > to the service providers.
> > > One drawback,  It will introduce another call to check instead of 
> > > centralizing everything around Authorizer.
> > > 
> > > Thanks,
> > > Harsha
> > > 
> > > On Mon, Feb 25, 2019, at 2:43 PM, Ying Zheng wrote:
> > > > If you guys don't like the extension of authorizer interface, I will 
> > > > just
> > > > propose a single broker dynamic configuration: client.min.api.version, 
> > > > to
> > > > keep things simple.
> > > > 
> > > > What do you think?
> > > > 
> > > > On Mon, Feb 25, 2019 at 2:23 PM Ying Zheng  wrote:
> > > > 
> > > > > @Viktor 

Re: [DISCUSS] KIP-426: Persist Broker Id to Zookeeper

2019-02-27 Thread Colin McCabe
Hi Li,

 > The mechanism simplifies deployment because the same configuration can be 
 > used across all brokers, however, in a large system where disk failure is 
 > a norm, the meta file could often get lost, causing a new broker id being 
 > allocated. This is problematic because new broker id has no partition 
 > assigned to it so it can’t do anything, while partitions assigned to the 
 > old one lose one replica

If all of the disks have failed, then the partitions will lose their replicas 
no matter what, right?  If any of the disks is still around, then there will be 
a meta file on the disk which contains the previous broker ID.  So I'm not sure 
that we need to change anything here.

best,
Colin


On Tue, Feb 5, 2019, at 14:38, Li Kan wrote:
> Hi, I have KIP-426, which is a small change on automatically determining
> broker id when starting up. I am new to Kafka so there are a bunch of
> design trade-offs that I might be missing or hard to decide, so I'd like to
> get some suggestions on it. I'd expect (and open) to modify (or even
> totally rewrite) the KIP based on suggestions. Thanks.
> 
> -- 
> Best,
> Kan
>


Re: Speeding up integration tests

2019-02-27 Thread Colin McCabe
On Wed, Feb 27, 2019, at 10:02, Ron Dagostino wrote:
> Hi everyone.  Maybe providing the option to run it both ways -- start your
> own cluster vs. using one that is pre-started -- might be useful?  Don't
> know how it would work or if it would be useful, but it is something to
> think about.
> 
> Also, while the argument against using a pre-started cluster due to an
> expected increase in test flakiness is reasonable, it is conjecture and may
> not turn out to be correct; if it isn't too much trouble it might be worth
> it to actually see if the arguments is right since the decreased time
> benefit is less conjecture/more real.

It's not just a conjecture.  We tried sharing the same cluster for multiple 
test cases in Hadoop.  It often led to difficult to debug tests.  A test can 
change the cluster state in a subtle way that causes a following test to fail.  
The ordering that tests get run in JUnit is also random, which makes this even 
more frustrating for developers.

An easier solution to the problem of long test run times is to run only the 
tests that are affected by a particular change.  For example, if you make a 
change to Connect, you shouldn't need to re-run the tests for Clients, since 
your change doesn't impact any of that code.  We had this set up in Hadoop, and 
it helped free up a lot of Jenkins time.

best,
Colin

> 
> Ron
> 
> On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
>  wrote:
> 
> > Hi,
> >
> > while I am also extremely annoyed at times by the amount of coffee I
> > have to drink before tests finish I think the argument about flaky
> > tests is valid! The current setup has the benefit that every test case
> > runs on a pristine cluster, if we changed this we'd need to go through
> > all tests and ensure that topic names are different, which can
> > probably be abstracted to include a timestamp in the name or something
> > like that, but it is an additional failure potential.
> > Add to this the fact that "JUnit runs tests using a deterministic, but
> > unpredictable order" and the water gets even muddier. Potentially this
> > might mean that adding an additional test case changes the order that
> > existing test cases are executed in which might mean that all of a
> > sudden something breaks that you didn't even touch.
> >
> > Best regards,
> > Sönke
> >
> >
> > On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
> >  wrote:
> > >
> > > Hey Viktor,
> > >
> > > I am all up for the idea of speeding up the tests. Running the
> > > `:core:integrationTest` command takes an absurd amount of time as is and
> > is
> > > continuously going to go up if we don't do anything about it.
> > > Having said that, I am very scared that your proposal might significantly
> > > increase the test flakiness of current and future tests - test flakiness
> > is
> > > a huge problem we're battling. We don't get green PR builds too often -
> > it
> > > is very common that one or two flaky tests fail in each PR.
> > > We have also found it hard to get a green build for the 2.2 release (
> > > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> > >
> > > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com> wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I've been observing lately that unit tests usually take 2.5 hours to
> > run
> > > > and a very big portion of these are the core tests where a new cluster
> > is
> > > > spun up for every test. This takes most of the time. I ran a test
> > > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> > and
> > > > it shows for instance that running the whole class itself took 10
> > minutes
> > > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > bigger
> > > > test though, most of them won't take this much.
> > > > There are 74 classes that implement KafkaServerTestHarness and just
> > running
> > > > :core:integrationTest takes almost 2 hours.
> > > >
> > > > I think we could greatly speed up these integration tests by just
> > creating
> > > > the cluster once per class and perform the tests on separate methods. I
> > > > know that this a little bit contradicts to the principle that tests
> > should
> > > > be independent but it seems like recreating clusters for each is a very
> > > > expensive operation. Also if the tests are acting on different
> > resources
> > > > (different topics, etc.) then it might not hurt their independence.
> > There
> > > > might be cases of course where this is not possible but I think there
> > could
> > > > be a lot where it is.
> > > >
> > > > In the optimal case we could cut the testing time back by
> > approximately an
> > > > hour. This would save resources and give quicker feedback for PR
> > builds.
> > > >
> > > > What are your thoughts?
> > > > Has anyone thought about this or 

Re: Speeding up integration tests

2019-02-27 Thread Ron Dagostino
Hi everyone.  Maybe providing the option to run it both ways -- start your
own cluster vs. using one that is pre-started -- might be useful?  Don't
know how it would work or if it would be useful, but it is something to
think about.

Also, while the argument against using a pre-started cluster due to an
expected increase in test flakiness is reasonable, it is conjecture and may
not turn out to be correct; if it isn't too much trouble it might be worth
it to actually see if the arguments is right since the decreased time
benefit is less conjecture/more real.

Ron

On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
 wrote:

> Hi,
>
> while I am also extremely annoyed at times by the amount of coffee I
> have to drink before tests finish I think the argument about flaky
> tests is valid! The current setup has the benefit that every test case
> runs on a pristine cluster, if we changed this we'd need to go through
> all tests and ensure that topic names are different, which can
> probably be abstracted to include a timestamp in the name or something
> like that, but it is an additional failure potential.
> Add to this the fact that "JUnit runs tests using a deterministic, but
> unpredictable order" and the water gets even muddier. Potentially this
> might mean that adding an additional test case changes the order that
> existing test cases are executed in which might mean that all of a
> sudden something breaks that you didn't even touch.
>
> Best regards,
> Sönke
>
>
> On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
>  wrote:
> >
> > Hey Viktor,
> >
> > I am all up for the idea of speeding up the tests. Running the
> > `:core:integrationTest` command takes an absurd amount of time as is and
> is
> > continuously going to go up if we don't do anything about it.
> > Having said that, I am very scared that your proposal might significantly
> > increase the test flakiness of current and future tests - test flakiness
> is
> > a huge problem we're battling. We don't get green PR builds too often -
> it
> > is very common that one or two flaky tests fail in each PR.
> > We have also found it hard to get a green build for the 2.2 release (
> > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> >
> > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com> wrote:
> >
> > > Hi Folks,
> > >
> > > I've been observing lately that unit tests usually take 2.5 hours to
> run
> > > and a very big portion of these are the core tests where a new cluster
> is
> > > spun up for every test. This takes most of the time. I ran a test
> > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> and
> > > it shows for instance that running the whole class itself took 10
> minutes
> > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > seconds, so the useful time would be between 3-4 minutes. This is a
> bigger
> > > test though, most of them won't take this much.
> > > There are 74 classes that implement KafkaServerTestHarness and just
> running
> > > :core:integrationTest takes almost 2 hours.
> > >
> > > I think we could greatly speed up these integration tests by just
> creating
> > > the cluster once per class and perform the tests on separate methods. I
> > > know that this a little bit contradicts to the principle that tests
> should
> > > be independent but it seems like recreating clusters for each is a very
> > > expensive operation. Also if the tests are acting on different
> resources
> > > (different topics, etc.) then it might not hurt their independence.
> There
> > > might be cases of course where this is not possible but I think there
> could
> > > be a lot where it is.
> > >
> > > In the optimal case we could cut the testing time back by
> approximately an
> > > hour. This would save resources and give quicker feedback for PR
> builds.
> > >
> > > What are your thoughts?
> > > Has anyone thought about this or were there any attempts made?
> > >
> > > Best,
> > > Viktor
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>


Re: contributors list

2019-02-27 Thread Matthias J. Sax
Done.

On 2/27/19 1:21 AM, Jing Chen wrote:
> hi there,
> could anyone add me to the contributor list of kafka project, then I could
> assign tickets to myself on jira.
> My apache jira username is jingc.
> 
> Thanks
> Jing
> 



signature.asc
Description: OpenPGP digital signature


contributors list

2019-02-27 Thread Jing Chen
hi there,
could anyone add me to the contributor list of kafka project, then I could
assign tickets to myself on jira.
My apache jira username is jingc.

Thanks
Jing


[jira] [Created] (KAFKA-8012) NullPointerException while truncating at high watermark can crash replica fetcher thread

2019-02-27 Thread Colin Hicks (JIRA)
Colin Hicks created KAFKA-8012:
--

 Summary: NullPointerException while truncating at high watermark 
can crash replica fetcher thread
 Key: KAFKA-8012
 URL: https://issues.apache.org/jira/browse/KAFKA-8012
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1, 2.2.0
Reporter: Colin Hicks


An NPE can occur when the replica fetcher manager simultaneously calls 
`removeFetcherForPartitions`, removing the corresponding partitionStates, while 
a replica fetcher thread attempts to truncate the same partition(s) in 
`truncateToHighWatermark`.

Stack trace for failure case:

{{java.lang.NullPointerException}}
{{ at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
{{ at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
{{ at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
{{ at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
{{ at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
{{ at 
kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
{{ at 
kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
{{ at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
{{ at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk8 #3421

2019-02-27 Thread Apache Jenkins Server
See 


Changes:

[bbejeck] KAFKA-7918: Inline generic parameters Pt. II: RocksDB Bytes Store and

--
[...truncated 2.31 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 

Re: [VOTE] KIP-307: Allow to define custom processor names with KStreams DSL

2019-02-27 Thread Bill Bejeck
+1(binding)
-Bill

On 2019/02/27 02:01:41, "Matthias J. Sax"  wrote: 
> +1 (binding)
> 
> 
> -Matthias
> 
> On 1/25/19 9:49 AM, Guozhang Wang wrote:
> > Fair enough; could you summarize the open questions on the DISCUSS thread
> > then?
> > 
> > On Fri, Jan 25, 2019 at 9:46 AM Matthias J. Sax 
> > wrote:
> > 
> >> Guozhang,
> >>
> >> KIP deadline passed yesterday. The last comments on the DISCUSS thread
> >> are not addressed yet. We need to move this to 2.3.
> >>
> >> -Matthias
> >>
> >> On 1/24/19 4:23 PM, Guozhang Wang wrote:
> >>> Hello folks,
> >>>
> >>> I'd like to call out for another time on voting for this KIP, given the
> >>> deadline is approaching now.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Thu, Jan 17, 2019 at 3:11 PM John Roesler  wrote:
> >>>
>  FWIW, I'm also +1 on the consistency changes to Joined.
> 
>  -John
> 
>  On Thu, Jan 17, 2019 at 5:04 PM Guozhang Wang 
> >> wrote:
> 
> > Hello Florian,
> >
> > Thanks for the writeup! I made a pass on the wiki page itself and left
> >> a
> > comment on it regarding the newly added APIs. Could you take a look? If
> > that makes sense could you update the wiki page to make sure we have
> >> all
> > the relevant public API changes.
> >
> >
> > Guozhang
> >
> > On Thu, Jan 17, 2019 at 1:39 PM John Roesler 
> >> wrote:
> >
> >> Yes, thanks Florian!
> >>
> >> +1 (nonbinding) from me.
> >>
> >> -John
> >>
> >> On Thu, Jan 17, 2019 at 3:36 PM Bill Bejeck 
> >> wrote:
> >>
> >>> Thanks for the KIP Florian, it will be a useful addition.
> >>>
> >>> +1 on the KIP for me.
> >>>
> >>> -Bill
> >>>
> >>> On Thu, Jan 17, 2019 at 4:31 PM Florian Hussonnois <
> >> fhussonn...@gmail.com>
> >>> wrote:
> >>>
>  Hi folks,
> 
>  I would like to initiate a vote for the following KIP :
> 
> 
> >>>
> >>
> >
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
> 
>  Note, there is still some minor discussions regarding the
> >> implementation.
> 
>  Thanks
>  --
>  Florian HUSSONNOIS
> 
> >>>
> >>
> >
> >
> > --
> > -- Guozhang
> >
> 
> >>>
> >>>
> >>
> >>
> > 
> 
> 


Re: Speeding up integration tests

2019-02-27 Thread Sönke Liebau
Hi,

while I am also extremely annoyed at times by the amount of coffee I
have to drink before tests finish I think the argument about flaky
tests is valid! The current setup has the benefit that every test case
runs on a pristine cluster, if we changed this we'd need to go through
all tests and ensure that topic names are different, which can
probably be abstracted to include a timestamp in the name or something
like that, but it is an additional failure potential.
Add to this the fact that "JUnit runs tests using a deterministic, but
unpredictable order" and the water gets even muddier. Potentially this
might mean that adding an additional test case changes the order that
existing test cases are executed in which might mean that all of a
sudden something breaks that you didn't even touch.

Best regards,
Sönke


On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
 wrote:
>
> Hey Viktor,
>
> I am all up for the idea of speeding up the tests. Running the
> `:core:integrationTest` command takes an absurd amount of time as is and is
> continuously going to go up if we don't do anything about it.
> Having said that, I am very scared that your proposal might significantly
> increase the test flakiness of current and future tests - test flakiness is
> a huge problem we're battling. We don't get green PR builds too often - it
> is very common that one or two flaky tests fail in each PR.
> We have also found it hard to get a green build for the 2.2 release (
> https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
>
> On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
>
> > Hi Folks,
> >
> > I've been observing lately that unit tests usually take 2.5 hours to run
> > and a very big portion of these are the core tests where a new cluster is
> > spun up for every test. This takes most of the time. I ran a test
> > (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> > it shows for instance that running the whole class itself took 10 minutes
> > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > seconds, so the useful time would be between 3-4 minutes. This is a bigger
> > test though, most of them won't take this much.
> > There are 74 classes that implement KafkaServerTestHarness and just running
> > :core:integrationTest takes almost 2 hours.
> >
> > I think we could greatly speed up these integration tests by just creating
> > the cluster once per class and perform the tests on separate methods. I
> > know that this a little bit contradicts to the principle that tests should
> > be independent but it seems like recreating clusters for each is a very
> > expensive operation. Also if the tests are acting on different resources
> > (different topics, etc.) then it might not hurt their independence. There
> > might be cases of course where this is not possible but I think there could
> > be a lot where it is.
> >
> > In the optimal case we could cut the testing time back by approximately an
> > hour. This would save resources and give quicker feedback for PR builds.
> >
> > What are your thoughts?
> > Has anyone thought about this or were there any attempts made?
> >
> > Best,
> > Viktor
> >
>
>
> --
> Best,
> Stanislav



--
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


[jira] [Created] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest.

2019-02-27 Thread Bill Bejeck (JIRA)
Bill Bejeck created KAFKA-8011:
--

 Summary: Flaky Test RegexSourceIntegrationTest.
 Key: KAFKA-8011
 URL: https://issues.apache.org/jira/browse/KAFKA-8011
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Bill Bejeck
Assignee: Bill Bejeck


The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use an 
ArrayList to assert the topics assigned to the Streams application. 

The ConsumerRebalanceListener used in the test operates on this list as does 
the TestUtils.waitForCondition() to verify the expected topic assignments.

Using the same list in both places can cause a ConcurrentModficationException 
if the rebalance listener modifies the assignment at the same time 
TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Speeding up integration tests

2019-02-27 Thread Stanislav Kozlovski
Hey Viktor,

I am all up for the idea of speeding up the tests. Running the
`:core:integrationTest` command takes an absurd amount of time as is and is
continuously going to go up if we don't do anything about it.
Having said that, I am very scared that your proposal might significantly
increase the test flakiness of current and future tests - test flakiness is
a huge problem we're battling. We don't get green PR builds too often - it
is very common that one or two flaky tests fail in each PR.
We have also found it hard to get a green build for the 2.2 release (
https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).

On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:

> Hi Folks,
>
> I've been observing lately that unit tests usually take 2.5 hours to run
> and a very big portion of these are the core tests where a new cluster is
> spun up for every test. This takes most of the time. I ran a test
> (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> it shows for instance that running the whole class itself took 10 minutes
> and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> 100% overhead. Without profiler the whole class takes 7 minutes and 48
> seconds, so the useful time would be between 3-4 minutes. This is a bigger
> test though, most of them won't take this much.
> There are 74 classes that implement KafkaServerTestHarness and just running
> :core:integrationTest takes almost 2 hours.
>
> I think we could greatly speed up these integration tests by just creating
> the cluster once per class and perform the tests on separate methods. I
> know that this a little bit contradicts to the principle that tests should
> be independent but it seems like recreating clusters for each is a very
> expensive operation. Also if the tests are acting on different resources
> (different topics, etc.) then it might not hurt their independence. There
> might be cases of course where this is not possible but I think there could
> be a lot where it is.
>
> In the optimal case we could cut the testing time back by approximately an
> hour. This would save resources and give quicker feedback for PR builds.
>
> What are your thoughts?
> Has anyone thought about this or were there any attempts made?
>
> Best,
> Viktor
>


-- 
Best,
Stanislav


[jira] [Created] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-02-27 Thread Mickael Maison (JIRA)
Mickael Maison created KAFKA-8010:
-

 Summary: kafka-configs.sh does not allow setting config with an 
equal in the value
 Key: KAFKA-8010
 URL: https://issues.apache.org/jira/browse/KAFKA-8010
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Mickael Maison


The sasl.jaas.config typically includes equals in its value. Unfortunately the 
kafka-configs tool does not parse such values correctly and hits an error:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
org.apache.kafka.common.security.plain.PlainLoginModule required\n  
username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
org.apache.zookeeper.server.auth.DigestLoginModule required\n  
username=\"myuser2\"\n  password=\"mypassword2\;\n};"
requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-430 - Return Authorized Operations in Describe Responses

2019-02-27 Thread Rajini Sivaram
The vote has passed with 5 binding votes (Harsha, Gwen, Manikumar, Colin,
me) and 2 non-binding votes (Satish, Mickael). I will update the KIP page.

Many thanks to everyone for the feedback and votes.

Regards,

Rajini


On Tue, Feb 26, 2019 at 6:41 PM Rajini Sivaram 
wrote:

> Thanks Colin, I have updated the KIP to mention that we don't return
> UNKNOWN, ANY or ALL.
>
> Regards,
>
> Rajini
>
> On Tue, Feb 26, 2019 at 6:10 PM Colin McCabe  wrote:
>
>> Thanks, Rajini!  Just to make it clear, can we spell out that we don't
>> set the UNKNOWN, ANY, or ALL bits ever?
>>
>> +1 once that's resolved.
>>
>> cheers,
>> Colin
>>
>>
>> On Mon, Feb 25, 2019, at 04:11, Rajini Sivaram wrote:
>> > Hi Colin,
>> >
>> > Yes, it makes sense to reduce response size by using bit fields. Updated
>> > the KIP.
>> >
>> > I have also updated the KIP to say that clients will ignore any bits
>> set by
>> > the broker that are unknown to the client, so there will be no UNKNOWN
>> > operations in the set returned by AdminClient. Brokers may however set
>> bits
>> > regardless of client version. Does that match your expectation?
>> >
>> > Thank you,
>> >
>> > Rajini
>> >
>> >
>> > On Sat, Feb 23, 2019 at 1:03 AM Colin McCabe 
>> wrote:
>> >
>> > > Hi Rajini,
>> > >
>> > > Thanks for the explanations.
>> > >
>> > > On Fri, Feb 22, 2019, at 11:59, Rajini Sivaram wrote:
>> > > > Hi Colin,
>> > > >
>> > > > Thanks for the review. Sorry I meant that an array of INT8's, each
>> of
>> > > which
>> > > > is an AclOperation code will be returned. I have clarified that in
>> the
>> > > KIP.
>> > >
>> > > Do you think it's worth considering a bitfield here still?  An array
>> will
>> > > take up at least 4 bytes for the length, plus whatever length the
>> elements
>> > > are.  A 32-bit bitfield would pretty much always take up less space.
>> And
>> > > we can have a new version of the RPC with 64 bits or whatever if we
>> outgrow
>> > > 32 operations.  MetadataResponse for a big cluster could contain
>> quite a
>> > > lot of topics, tens or hundreds of thousands.  So the space savings
>> could
>> > > be considerable.
>> > >
>> > > >
>> > > > All permitted operations will be returned from the set of supported
>> > > > operations on each resource. This is regardless of whether the
>> access was
>> > > > implicitly or explicitly granted. Have clarified that in the KIP.
>> > >
>> > > Thanks.
>> > >
>> > > >
>> > > > Since the values returned are INT8 codes, clients can simply ignore
>> any
>> > > > they don't recognize. Java clients convert these into
>> > > AclOperation.UNKNOWN.
>> > > > That way we don't need to update Metadata/describe request versions
>> when
>> > > > new operations are added to a resource. This is consistent with
>> > > > DescribeAcls behaviour. Have added this to the compatibility
>> section of
>> > > the
>> > > > KIP.
>> > >
>> > > Displaying "unknown" for new AclOperations made sense for
>> DescribeAcls,
>> > > since the ACL is explicitly referencing the new AclOperation.   For
>> > > example, if you upgrade your Kafka cluster to a new version that
>> supports
>> > > DESCRIBE_CONFIGS, your old ACLs still don't reference
>> DESCRIBE_CONFIGS.
>> > >
>> > > In contrast, in the case here, existing topics (or other resources)
>> might
>> > > pick up the new ACLOperation just by upgrading Kafka.  For example,
>> if you
>> > > had ALL permission on a topic and you upgrade to a new version with
>> > > DESCRIBE_CONFIGS, you now have DESCRIBE_CONFIGS permission on that
>> topic.
>> > > This would result in a lot of "unknowns" being displayed here, which
>> might
>> > > not be ideal.
>> > >
>> > > Also, there is an argument from intent-- the intention here is to let
>> you
>> > > know what you can do with a resource that already exists.  Knowing
>> that you
>> > > can do an unknown thing isn't very useful.  In contrast, for
>> DescribeAcls,
>> > > knowing that an ACL references an operation your software is too old
>> to
>> > > understand is useful (you may choose not to modify that ACL, since you
>> > > don't know what it does, for example.)  What do you think?
>> > >
>> > > cheers,
>> > > Colin
>> > >
>> > >
>> > > >
>> > > > Thank you,
>> > > >
>> > > > Rajini
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Feb 22, 2019 at 6:46 PM Colin McCabe 
>> wrote:
>> > > >
>> > > > > Hi Rajini,
>> > > > >
>> > > > > Thanks for the KIP!
>> > > > >
>> > > > > The KIP specifies that "Authorized operations will be returned as
>> [an]
>> > > > > INT8 consistent with [the] AclOperation used in ACL requests and
>> > > > > responses."  But there may be more than one AclOperation that is
>> > > applied to
>> > > > > a given resource.  For example, a principal may have both READ and
>> > > WRITE
>> > > > > permission on a topic.
>> > > > >
>> > > > > One option for representing this would be a bitfield.  A 32-bit
>> > > bitfield
>> > > > > could have the appropriate bits set.  For example, if READ and
>> WRITE
>> > > > > operations were permitted, bits 3 

Speeding up integration tests

2019-02-27 Thread Viktor Somogyi-Vass
Hi Folks,

I've been observing lately that unit tests usually take 2.5 hours to run
and a very big portion of these are the core tests where a new cluster is
spun up for every test. This takes most of the time. I ran a test
(TopicCommandWithAdminClient with 38 test inside) through the profiler and
it shows for instance that running the whole class itself took 10 minutes
and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
100% overhead. Without profiler the whole class takes 7 minutes and 48
seconds, so the useful time would be between 3-4 minutes. This is a bigger
test though, most of them won't take this much.
There are 74 classes that implement KafkaServerTestHarness and just running
:core:integrationTest takes almost 2 hours.

I think we could greatly speed up these integration tests by just creating
the cluster once per class and perform the tests on separate methods. I
know that this a little bit contradicts to the principle that tests should
be independent but it seems like recreating clusters for each is a very
expensive operation. Also if the tests are acting on different resources
(different topics, etc.) then it might not hurt their independence. There
might be cases of course where this is not possible but I think there could
be a lot where it is.

In the optimal case we could cut the testing time back by approximately an
hour. This would save resources and give quicker feedback for PR builds.

What are your thoughts?
Has anyone thought about this or were there any attempts made?

Best,
Viktor


Re: [VOTE] 2.2.0 RC0

2019-02-27 Thread Satish Duggana
+1 (non-binding)

- Ran testAll/releaseTarGzAll successfully with NO failures.
- Ran through quickstart of core/streams on builds generated from 2.2.0-rc0
tag
- Ran few internal apps targeting to topics on 3 node cluster.

Thanks for running the release Matthias!

On Tue, Feb 26, 2019 at 8:17 PM Adam Bellemare 
wrote:

> Downloaded, compiled and passed all tests successfully.
>
> Ran quickstart (https://kafka.apache.org/quickstart) up to step 6 without
> issue.
>
> (+1 non-binding).
>
> Adam
>
>
>
> On Mon, Feb 25, 2019 at 9:19 PM Matthias J. Sax 
> wrote:
>
> > @Stephane
> >
> > Thanks! You are right (I copied the list from an older draft without
> > double checking).
> >
> > On the release Wiki page, it's correctly listed as postponed:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827512
> >
> >
> > @Viktor
> >
> > Thanks. This will not block the release, but I'll make sure to include
> > it in the webpage update.
> >
> >
> >
> > -Matthias
> >
> > On 2/25/19 5:16 AM, Viktor Somogyi-Vass wrote:
> > > Hi Matthias,
> > >
> > > I've noticed a minor line break issue in the upgrade docs. I've
> created a
> > > small PR for that: https://github.com/apache/kafka/pull/6320
> > >
> > > Best,
> > > Viktor
> > >
> > > On Sun, Feb 24, 2019 at 10:16 PM Stephane Maarek <
> > kafka.tutori...@gmail.com>
> > > wrote:
> > >
> > >> Hi Matthias
> > >>
> > >> Thanks for this
> > >> Running through the list of KIPs. I think this is not included in 2.2:
> > >>
> > >> - Allow clients to suppress auto-topic-creation
> > >>
> > >> Regards
> > >> Stephane
> > >>
> > >> On Sun, Feb 24, 2019 at 1:03 AM Matthias J. Sax <
> matth...@confluent.io>
> > >> wrote:
> > >>
> > >>> Hello Kafka users, developers and client-developers,
> > >>>
> > >>> This is the first candidate for the release of Apache Kafka 2.2.0.
> > >>>
> > >>> This is a minor release with the follow highlight:
> > >>>
> > >>>  - Added SSL support for custom principle name
> > >>>  - Allow SASL connections to periodically re-authenticate
> > >>>  - Improved consumer group management
> > >>>- default group.id is `null` instead of empty string
> > >>>  - Add --under-min-isr option to describe topics command
> > >>>  - Allow clients to suppress auto-topic-creation
> > >>>  - API improvement
> > >>>- Producer: introduce close(Duration)
> > >>>- AdminClient: introduce close(Duration)
> > >>>- Kafka Streams: new flatTransform() operator in Streams DSL
> > >>>- KafkaStreams (and other classed) now implement AutoClosable to
> > >>> support try-with-resource
> > >>>- New Serdes and default method implementations
> > >>>  - Kafka Streams exposed internal client.id via ThreadMetadata
> > >>>  - Metric improvements:  All `-min`, `-avg` and `-max` metrics will
> now
> > >>> output `NaN` as default value
> > >>>
> > >>>
> > >>> Release notes for the 2.2.0 release:
> > >>> http://home.apache.org/~mjsax/kafka-2.2.0-rc0/RELEASE_NOTES.html
> > >>>
> > >>> *** Please download, test and vote by Friday, March 1, 9am PST.
> > >>>
> > >>> Kafka's KEYS file containing PGP keys we use to sign the release:
> > >>> http://kafka.apache.org/KEYS
> > >>>
> > >>> * Release artifacts to be voted upon (source and binary):
> > >>> http://home.apache.org/~mjsax/kafka-2.2.0-rc0/
> > >>>
> > >>> * Maven artifacts to be voted upon:
> > >>>
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >>>
> > >>> * Javadoc:
> > >>> http://home.apache.org/~mjsax/kafka-2.2.0-rc0/javadoc/
> > >>>
> > >>> * Tag to be voted upon (off 2.2 branch) is the 2.2.0 tag:
> > >>> https://github.com/apache/kafka/releases/tag/2.2.0-rc0
> > >>>
> > >>> * Documentation:
> > >>> https://kafka.apache.org/22/documentation.html
> > >>>
> > >>> * Protocol:
> > >>> https://kafka.apache.org/22/protocol.html
> > >>>
> > >>> * Successful Jenkins builds for the 2.2 branch:
> > >>> Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.2-jdk8/31/
> > >>>
> > >>> * System tests:
> > >>> https://jenkins.confluent.io/job/system-test-kafka/job/2.2/
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> Thanks,
> > >>>
> > >>> -Matthias
> > >>>
> > >>>
> > >>
> > >
> >
> >
>


Jenkins build is back to normal : kafka-2.0-jdk8 #231

2019-02-27 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-8009) Uppgrade Jenkins job Gradle version from 4.10.2 to 5.x

2019-02-27 Thread JIRA
Dejan Stojadinović created KAFKA-8009:
-

 Summary: Uppgrade Jenkins job Gradle version from 4.10.2 to 5.x
 Key: KAFKA-8009
 URL: https://issues.apache.org/jira/browse/KAFKA-8009
 Project: Kafka
  Issue Type: Improvement
  Components: build
Reporter: Dejan Stojadinović
Assignee: Ismael Juma


*Rationale:* 
 * Kafka project already uses gradle 5.x (5.1.1 at the moment)
 * recently released *spotbugs-gradle-plugin* version 1.6.10 drops support for 
Gradle 4: 
[https://github.com/spotbugs/spotbugs-gradle-plugin/blob/1.6.10/CHANGELOG.md#1610---2019-02-18]

 

*Note:* related github pull request that contains spotbugs plugin version bump 
(among other things): 
https://github.com/apache/kafka/pull/6332#issuecomment-467631246



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-1.0-jdk7 #260

2019-02-27 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

2019-02-27 Thread Abhi (JIRA)
Abhi created KAFKA-8008:
---

 Summary: Clients unable to connect and replicas are not able to 
connect to each other
 Key: KAFKA-8008
 URL: https://issues.apache.org/jira/browse/KAFKA-8008
 Project: Kafka
  Issue Type: Bug
  Components: controller, core
Affects Versions: 2.1.1, 2.1.0
 Environment: Java 11
Reporter: Abhi


Hi,

I upgrade to Kafka v2.1.1 in hope that issue 
https://issues.apache.org/jira/browse/KAFKA-7925 will be fixed. However, I am 
still seeing the similar issue in my kafka cluster.

I am seeing the same exceptions in all the servers. The 
kafka-network-thread-1-ListenerName are all consuming full cpu cycles. Lots of 
TCP connections are in CLOSE_WAIT state.

My broker setup is using kerberos authentication with 
-Dsun.security.jgss.native=true.

I am not sure how to handle this? Will increasing the kafka-network thread 
count help if it is possible?

Does this seem like a bug? I am happy to help in anyway I can as this issue 
blocking our production usage and would like to get it resolved as early as 
possible.


*server.log snippet from one of the servers:
*[2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 2 
with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 
(id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
SEND_APIVERSIONS_REQUEST 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG Creating SaslClient: 
client=kafka/mwkafka-prod-01.nyc.foo@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI]
 (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
RECEIVE_APIVERSIONS_RESPONSE 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Completed connection to node 2. Ready. 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 5 
with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) 
to node 5: java.net.SocketTimeoutException: Failed to connect within 3 ms. 
(org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error in response for fetch request (type=FetchRequest, 
replicaId=1, maxWait=1, minBytes=1, maxBytes=10485760, 
fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, 
maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-newtp-5-64-0=(fetchOffset=8936, 
logStartOffset=8936, maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, maxBytes=1048576, 
currentLeaderEpoch=Optional[19]), reddyvel-tp3-58-0=(fetchOffset=264495, 
logStartOffset=264495, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, 
currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, 
logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, 
logStartOffset=2, maxBytes=1048576, currentLeaderEpoch=Optional[6]), 
reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-988-0=(fetchOffset=3173185, 
logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-409-0=(fetchOffset=3173194, 
logStartOffset=3173194, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]),