Re: Wrong SSL messages when handshake fails

2021-10-07 Thread Rodolfo Kohn
Sure, I created bug
KAFKA-13360

Thanks!

Rodolfo Kohn
Wayaga LLC, Principal Consultant
+1 (208) 206 7324
https://www.linkedin.com/in/rodolfo-kohn-31032/


From: Ismael Juma 
Sent: Thursday, October 7, 2021 4:09 PM
To: dev 
Subject: Re: Wrong SSL messages when handshake fails

Hi,

Thanks for the report. Can you please file a JIRA ticket?

Ismael

On Thu, Oct 7, 2021 at 3:47 PM Rodolfo Kohn  wrote:

> Hello, I’d like to report an error I noticed while testing Kafka with a
> tool I developed to detect network issues in applications.
>
> When a consumer tries to connect to a Kafka broker and there is an error
> in the SSL handshake, like the server sending a certificate that cannot be
> validated for not matching the common name with the server/domain name,
> Kafka sends out erroneous SSL messages before sending an SSL alert. This
> error occurs in client but also can be seen in server.
> Because of the nature of the problem it seems it will happen in more if
> not all handshake errors.
> I've debugged and analyzed the Kafka networking code
> in org.apache.kafka.common.network and wrote a detailed description of how
> the error occurs.
>
> I'm attaching the pcap file and a pdf with the detailed description of
> where the error is in the code.
>
> I executed a very basic test between kafka-console-consumer and a simple
> installation of one Kafka broker with TLS.
> The test consisted on a Kafka broker with a certificate that didn’t match
> the domain name I used to identify the server. The CA was well set up to
> avoid related problems, like unknown CA error code. Thus, when the server
> sends the certificate to the client, the handshake fails with code error 46
> (certificate unknown). The goal was that my tool would detect the issue and
> send an event, describing a TLS handshake problem for both processes.
> However, I noticed the tool sent what I thought it was the wrong event, it
> sent a TLS exception event for an unexpected message instead of an event
> for TLS alert for certificate unknown.
>
> I noticed that during handshake, after the client receives Sever Hello,
> Certificate, Server Key Exchange, and Server Hello Done, it sends out the
> same Client Hello it sent at the beginning and then 3 more records with all
> zeroes, in two more messages. It sent a total of 16,709 Bytes including the
> 289 Bytes of Client Hello record.
>
>
> I'm working with Kafka version 2.13-2.8.0
>
> Thanks!
>
> Rodolfo Kohn
>
> Wayaga LLC, Principal Consultant
>
> +1 (208) 206 7324
>
>
> https://www.linkedin.com/in/rodolfo-kohn-31032/
>
>
>
>
>
>


[jira] [Created] (KAFKA-13360) Wrong SSL messages when handshake fails

2021-10-07 Thread Rodolfo Kohn (Jira)
Rodolfo Kohn created KAFKA-13360:


 Summary: Wrong SSL messages when handshake fails
 Key: KAFKA-13360
 URL: https://issues.apache.org/jira/browse/KAFKA-13360
 Project: Kafka
  Issue Type: Bug
  Components: network
Affects Versions: 2.8.0
 Environment: Two VMs, one running one Kafka broker and the other one 
running kafka-console-consumer.sh.
The consumer is validating the server certificate.
Both VMs are VirtualBox running in the same laptop. 
Using internal LAN.
Latency is in the order of microseconds.
More details in attached PDF.

Reporter: Rodolfo Kohn
 Attachments: Kafka error.pdf, 
dump_192.168.56.101_192.168.56.102_32776_9093_2021_10_06_21_09_19.pcap, 
ssl_kafka_error_logs_match_ssl_logs.txt, 
ssl_kafka_error_logs_match_ssl_logs2.txt

When a consumer tries to connect to a Kafka broker and there is an error in the 
SSL handshake, like the server sending a certificate that cannot be validated 
for not matching the common name with the server/domain name, Kafka sends out 
erroneous SSL messages before sending an SSL alert. This error occurs in client 
but also can be seen in server.
Because of the nature of the problem it seems it will happen in more if not all 
handshake errors.
I've debugged and analyzed the Kafka networking code in 
org.apache.kafka.common.network and wrote a detailed description of how the 
error occurs.

Attaching the pcap file and a pdf with the detailed description of where the 
error is in the networking code (SslTransportLayer, Channel, Selector).

I executed a very basic test between kafka-console-consumer and a simple 
installation of one Kafka broker with TLS.
The test consisted on a Kafka broker with a certificate that didn’t match the 
domain name I used to identify the server. The CA was well set up to avoid 
related problems, like unknown CA error code. Thus, when the server sends the 
certificate to the client, the handshake fails with code error 46 (certificate 
unknown). The goal was that my tool would detect the issue and send an event, 
describing a TLS handshake problem for both processes. However, I noticed the 
tool sent what I thought it was the wrong event, it sent a TLS exception event 
for an unexpected message instead of an event for TLS alert for certificate 
unknown.

I noticed that during handshake, after the client receives Sever Hello, 
Certificate, Server Key Exchange, and Server Hello Done, it sends out the same 
Client Hello it sent at the beginning and then 3 more records with all zeroes, 
in two more messages. It sent a total of 16,709 Bytes including the 289 Bytes 
of Client Hello record.

 

This looks also like a design error regarding how protocol failures are handled.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-778 KRaft Upgrades

2021-10-07 Thread Jun Rao
Hi, David,

Thanks for the KIP. A few comments below.

10. It would be useful to describe how the controller node determines the
RPC version used to communicate to other controller nodes. There seems to
be a bootstrap problem. A controller node can't read the log and
therefore the feature level until a quorum leader is elected. But leader
election requires an RPC.

11. For downgrades, it would be useful to describe how to determine the
downgrade process (generating new snapshot, propagating the snapshot, etc)
has completed. We could block the UpdateFeature request until the process
is completed. However, since the process could take time, the request could
time out. Another way is through DescribeFeature and the server only
reports downgraded versions after the process is completed.

12. Since we are changing UpdateFeaturesRequest, do we need to change the
AdminClient api for updateFeatures too?

13. For the paragraph starting with "In the absence of an operator defined
value for metadata.version", in KIP-584, we described how to finalize
features with New cluster bootstrap. In that case, it's inconvenient for
the users to have to run an admin tool to finalize the version for each
feature. Instead, the system detects that the /features path is missing in
ZK and thus automatically finalizes every feature with the latest supported
version. Could we do something similar in the KRaft mode?

14. After the quorum leader generates a new snapshot, how do we force other
nodes to pick up the new snapshot?

15. I agree with Jose that it will be useful to describe when generating a
new snapshot is needed. To me, it seems the new snapshot is only needed
when incompatible changes are made.

7. Jose, what control records were you referring?

Thanks,

Jun


On Tue, Oct 5, 2021 at 8:53 AM David Arthur  wrote:

> Jose, thanks for the thorough review and comments!
>
> I am out of the office until next week, so I probably won't be able to
> update the KIP until then. Here are some replies to your questions:
>
> 1. Generate snapshot on upgrade
> > > Metadata snapshot is generated and sent to the other nodes
> > Why does the Active Controller need to generate a new snapshot and
> > force a snapshot fetch from the replicas (inactive controller and
> > brokers) on an upgrade? Isn't writing the FeatureLevelRecord good
> > enough to communicate the upgrade to the replicas?
>
>
> You're right, we don't necessarily need to _transmit_ a snapshot, since
> each node can generate its own equivalent snapshot
>
> 2. Generate snapshot on downgrade
> > > Metadata snapshot is generated and sent to the other inactive
> > controllers and to brokers (this snapshot may be lossy!)
> > Why do we need to send this downgraded snapshot to the brokers? The
> > replicas have seen the FeatureLevelRecord and noticed the downgrade.
> > Can we have the replicas each independently generate a downgraded
> > snapshot at the offset for the downgraded FeatureLevelRecord? I assume
> > that the active controller will guarantee that all records after the
> > FatureLevelRecord use the downgraded version. If so, it would be good
> > to mention that explicitly.
>
>
> Similar to above, yes a broker that detects a downgrade via
> FeatureLevelRecord could generate its own downgrade snapshot and reload its
> state from that. This does get a little fuzzy when we consider cases where
> brokers are on different software versions and could be generating a
> downgrade snapshot for version X, but using different versions of the code.
> It might be safer to let the controller generate the snapshot so each
> broker (regardless of software version) gets the same records. However, for
> upgrades (or downgrades) we expect the whole cluster to be running the same
> software version before triggering the metadata.version change, so perhaps
> this isn't a likely scenario. Thoughts?
>
>
> 3. Max metadata version
> > >For the first release that supports metadata.version, we can simply
> > initialize metadata.version with the current (and only) version. For
> future
> > releases, we will need a mechanism to bootstrap a particular version.
> This
> > could be done using the meta.properties file or some similar mechanism.
> The
> > reason we need the allow for a specific initial version is to support the
> > use case of starting a Kafka cluster at version X with an older
> > metadata.version.
>
>
> I assume that the Active Controller will learn the metadata version of
> > the broker through the BrokerRegistrationRequest. How will the Active
> > Controller learn about the max metadata version of the inactive
> > controller nodes? We currently don't send a registration request from
> > the inactive controller to the active controller.
>
>
> This came up during the design, but I neglected to add it to the KIP. We
> will need a mechanism for determining the supported features of each
> controller similar to how brokers use BrokerRegistrationRequest. Perhaps
> controllers could write a 

Re: Wrong SSL messages when handshake fails

2021-10-07 Thread Ismael Juma
Hi,

Thanks for the report. Can you please file a JIRA ticket?

Ismael

On Thu, Oct 7, 2021 at 3:47 PM Rodolfo Kohn  wrote:

> Hello, I’d like to report an error I noticed while testing Kafka with a
> tool I developed to detect network issues in applications.
>
> When a consumer tries to connect to a Kafka broker and there is an error
> in the SSL handshake, like the server sending a certificate that cannot be
> validated for not matching the common name with the server/domain name,
> Kafka sends out erroneous SSL messages before sending an SSL alert. This
> error occurs in client but also can be seen in server.
> Because of the nature of the problem it seems it will happen in more if
> not all handshake errors.
> I've debugged and analyzed the Kafka networking code
> in org.apache.kafka.common.network and wrote a detailed description of how
> the error occurs.
>
> I'm attaching the pcap file and a pdf with the detailed description of
> where the error is in the code.
>
> I executed a very basic test between kafka-console-consumer and a simple
> installation of one Kafka broker with TLS.
> The test consisted on a Kafka broker with a certificate that didn’t match
> the domain name I used to identify the server. The CA was well set up to
> avoid related problems, like unknown CA error code. Thus, when the server
> sends the certificate to the client, the handshake fails with code error 46
> (certificate unknown). The goal was that my tool would detect the issue and
> send an event, describing a TLS handshake problem for both processes.
> However, I noticed the tool sent what I thought it was the wrong event, it
> sent a TLS exception event for an unexpected message instead of an event
> for TLS alert for certificate unknown.
>
> I noticed that during handshake, after the client receives Sever Hello,
> Certificate, Server Key Exchange, and Server Hello Done, it sends out the
> same Client Hello it sent at the beginning and then 3 more records with all
> zeroes, in two more messages. It sent a total of 16,709 Bytes including the
> 289 Bytes of Client Hello record.
>
>
> I'm working with Kafka version 2.13-2.8.0
>
> Thanks!
>
> Rodolfo Kohn
>
> Wayaga LLC, Principal Consultant
>
> +1 (208) 206 7324
>
>
> https://www.linkedin.com/in/rodolfo-kohn-31032/
>
>
>
>
>
>


Wrong SSL messages when handshake fails

2021-10-07 Thread Rodolfo Kohn
Hello, I’d like to report an error I noticed while testing Kafka with a tool I 
developed to detect network issues in applications. 

When a consumer tries to connect to a Kafka broker and there is an error in the 
SSL handshake, like the server sending a certificate that cannot be validated 
for not matching the common name with the server/domain name, Kafka sends out 
erroneous SSL messages before sending an SSL alert. This error occurs in client 
but also can be seen in server.
Because of the nature of the problem it seems it will happen in more if not all 
handshake errors.
I've debugged and analyzed the Kafka networking code in 
org.apache.kafka.common.network and wrote a detailed description of how the 
error occurs.

I'm attaching the pcap file and a pdf with the detailed description of where 
the error is in the code.

I executed a very basic test between kafka-console-consumer and a simple 
installation of one Kafka broker with TLS. 
The test consisted on a Kafka broker with a certificate that didn’t match the 
domain name I used to identify the server. The CA was well set up to avoid 
related problems, like unknown CA error code. Thus, when the server sends the 
certificate to the client, the handshake fails with code error 46 (certificate 
unknown). The goal was that my tool would detect the issue and send an event, 
describing a TLS handshake problem for both processes. However, I noticed the 
tool sent what I thought it was the wrong event, it sent a TLS exception event 
for an unexpected message instead of an event for TLS alert for certificate 
unknown.

I noticed that during handshake, after the client receives Sever Hello, 
Certificate, Server Key Exchange, and Server Hello Done, it sends out the same 
Client Hello it sent at the beginning and then 3 more records with all zeroes, 
in two more messages. It sent a total of 16,709 Bytes including the 289 Bytes 
of Client Hello record. 


I'm working with Kafka version 2.13-2.8.0

Thanks!

Rodolfo Kohn

Wayaga LLC, Principal Consultant

+1 (208) 206 7324


https://www.linkedin.com/in/rodolfo-kohn-31032/







Re: [DISCUSS] KIP-768: Extend SASL/OAUTHBEARER with Support for OIDC

2021-10-07 Thread Rajini Sivaram
Hi Kirk,

Thanks for the updates. Looks good.

Just one comment on the naming of configs. For configs that are very
specific to OAUTHBEARER, can we add `sasl.oauthbearer` as the prefix,
similar to `sasl.kerberos.` that we use for Kerberos configs, e.g. `
sasl.login.sub.claim.name`. For configs that could potentially be used by
any SASL mechanism that uses a remote server, we can keep the current
naming without the `oauthbearer`, e.g. `sasl.login.connect.timeout.ms`. I
think we want to use the same convention for broker-side configs too, even
though broker configs may specify oauthbearer in the listener prefix so
that we remain consistent with other configs (also, we allow listener
configs to be specified without listener prefix as well).

Regards,

Rajini


On Thu, Oct 7, 2021 at 6:51 PM Kirk True  wrote:

> Hi Rajini,
>
> I've updated the KIP with your feedback. Let me know if there's anything
> still amiss.
>
> Thanks,
> Kirk
>
> On Wed, Oct 6, 2021, at 5:27 PM, Kirk True wrote:
>
> Hi Rajini,
>
> Thank you very much for your in-depth review! You highlighted a lot of
> dark corners :)
>
> >1. The diagram shows broker startup followed by `broker requests keys
> >from JWKS endpoint`.
> >   - Do we open broker ports only after we successfully get the keys?
> We
> >   need to guarantee this to ensure that clients don't see
> authentication
> >   failures during broker restarts.
> >   - Doesn't sound like we will persist the keys, so what is the
> >   behaviour if the OAuth server is not available? Will broker retry
> >   forever?
>
> In the case where the OAuth provider is unavailable, is it preferable for
> the broker to start up in a diminished capacity or to simply fail to start
> up at all?
>
> It's my understanding that a broker can support more than one form of
> authentication. If so, should we continue start up if the other forms of
> authentication are working?
>
> >2. Client configuration includes a large number of JAAS config options
> >like `loginRetryWaitMs` and `loginRetryMaxWaitMs`. Have we considered
> >making them top-level configs instead? Not saying we should, but it
> will be
> >good to document why we chose to do it this way. The advantage of
> >top-level option is that it can be used for other similar login
> methods
> >in future. And they become visible in logs (unlike `sasl.jaas.config`
> >which is considered sensitive and hence not logged). The current
> >approach keeps all the related configs together in one place, so that
> may
> >be ok too, worth documenting the reasons anyway. It is useful to keep
> >credentials in `sasl.jaas.config`, it is less clear with other configs
> >(e.g. we have various `sasl.kerberos.` configs.
>
> I can look at moving the more general, non-sensitive configuration out
> from under the JAAS configuration. Now that you mention it, I did notice
> that the JAAS configuration was redacted in the logs.
>
> >3. The extension config uses inconsistent naming `
> >Extension_supportFeatureX`. If we are trying to keep this consistent
> >with the existing callback handler, should this be `
> >unsecuredLoginExtension_xxx` or otherwise `extension_xxx`?
>
> You're right, it was a half-baked attempt at consistency with the existing
> unsecured implementation.
>
> I wanted to drop the "unsecuredLogin" prefix as it doesn't apply. Do you
> have a preference for any of the following forms?
>
> * securedLoginExtension_xxx
> * secureLoginExtension_xxx
> * loginExtension_xxx
> * extension_xxx
>
> >4. We talk about re-authentication using KIP-368. Can we also describe
> >re-login on the client-side to acquire new tokens? That should be
> based on
> >expiry of the token and should happen irrespective of whether broker
> has
> >enabled re-authentication. The unsecured version already supports
> this, so
> >no additional work is necessary, worth mentioning nevertheless.
>
> I spent more time than I'd like to admit trying to trigger a client
> side-only refresh. While the client would refresh and grab an updated token
> from the OAuth provider, it never seemed to trigger a call to the broker to
> re-validate.
>
> I'll take another look to see what I'm missing.
>
> >5. KIP says: `A new key ID (kid) could appear in the header of an
> >incoming JWT access token. Code that can retrieve the JWKS from the
> OAuth
> >provider on demand will be implemented.`. What happens to the first
> >connection that requires this? Given we can't block network thread
> while we
> >do this network operation, will we fail authentications until we have
> >refreshed keys in the background thread?
>
> Ugh. Another good catch :)
>
> There are a few cases related to the timing of a new key ID being
> published. I'm going to try to make this sound all formal, but hopefully it
> doesn't just come off confusing :)
>
> Let A = the time that the OAuth provider publishes the updated 

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #512

2021-10-07 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 495044 lines...]
[2021-10-07T18:48:46.405Z] 
[2021-10-07T18:48:46.405Z] PlaintextConsumerTest > 
testMultiConsumerDefaultAssignor() STARTED
[2021-10-07T18:48:46.924Z] 
[2021-10-07T18:48:46.924Z] PlaintextConsumerTest > 
testAutoCommitOnCloseAfterWakeup() PASSED
[2021-10-07T18:48:46.924Z] 
[2021-10-07T18:48:46.924Z] PlaintextConsumerTest > testMaxPollRecords() STARTED
[2021-10-07T18:48:54.809Z] 
[2021-10-07T18:48:54.809Z] PlaintextConsumerTest > testMaxPollRecords() PASSED
[2021-10-07T18:48:54.809Z] 
[2021-10-07T18:48:54.809Z] PlaintextConsumerTest > testAutoOffsetReset() STARTED
[2021-10-07T18:48:59.199Z] 
[2021-10-07T18:48:59.199Z] PlaintextConsumerTest > testAutoOffsetReset() PASSED
[2021-10-07T18:48:59.199Z] 
[2021-10-07T18:48:59.199Z] PlaintextConsumerTest > 
testPerPartitionLagWithMaxPollRecords() STARTED
[2021-10-07T18:49:04.642Z] 
[2021-10-07T18:49:04.642Z] PlaintextConsumerTest > 
testMultiConsumerDefaultAssignor() PASSED
[2021-10-07T18:49:04.642Z] 
[2021-10-07T18:49:04.642Z] PlaintextConsumerTest > testInterceptors() STARTED
[2021-10-07T18:49:05.612Z] 
[2021-10-07T18:49:05.612Z] PlaintextConsumerTest > 
testPerPartitionLagWithMaxPollRecords() PASSED
[2021-10-07T18:49:05.612Z] 
[2021-10-07T18:49:05.612Z] PlaintextConsumerTest > testFetchInvalidOffset() 
STARTED
[2021-10-07T18:49:08.117Z] 
[2021-10-07T18:49:08.117Z] PlaintextConsumerTest > testInterceptors() PASSED
[2021-10-07T18:49:08.117Z] 
[2021-10-07T18:49:08.117Z] PlaintextConsumerTest > 
testConsumingWithEmptyGroupId() STARTED
[2021-10-07T18:49:10.978Z] 
[2021-10-07T18:49:10.978Z] PlaintextConsumerTest > testFetchInvalidOffset() 
PASSED
[2021-10-07T18:49:10.978Z] 
[2021-10-07T18:49:10.978Z] PlaintextConsumerTest > testAutoCommitIntercept() 
STARTED
[2021-10-07T18:49:12.541Z] 
[2021-10-07T18:49:12.541Z] PlaintextConsumerTest > 
testConsumingWithEmptyGroupId() PASSED
[2021-10-07T18:49:12.541Z] 
[2021-10-07T18:49:12.541Z] PlaintextConsumerTest > testPatternUnsubscription() 
STARTED
[2021-10-07T18:49:17.757Z] 
[2021-10-07T18:49:17.757Z] PlaintextConsumerTest > testAutoCommitIntercept() 
PASSED
[2021-10-07T18:49:17.757Z] 
[2021-10-07T18:49:17.757Z] PlaintextConsumerTest > 
testFetchHonoursMaxPartitionFetchBytesIfLargeRecordNotFirst() STARTED
[2021-10-07T18:49:21.232Z] 
[2021-10-07T18:49:21.232Z] PlaintextConsumerTest > testPatternUnsubscription() 
PASSED
[2021-10-07T18:49:21.232Z] 
[2021-10-07T18:49:21.232Z] PlaintextConsumerTest > testGroupConsumption() 
STARTED
[2021-10-07T18:49:22.521Z] 
[2021-10-07T18:49:22.521Z] PlaintextConsumerTest > 
testFetchHonoursMaxPartitionFetchBytesIfLargeRecordNotFirst() PASSED
[2021-10-07T18:49:22.521Z] 
[2021-10-07T18:49:22.521Z] PlaintextConsumerTest > testCommitSpecifiedOffsets() 
STARTED
[2021-10-07T18:49:24.819Z] 
[2021-10-07T18:49:24.819Z] PlaintextConsumerTest > testGroupConsumption() PASSED
[2021-10-07T18:49:24.819Z] 
[2021-10-07T18:49:24.819Z] PlaintextConsumerTest > testPartitionsFor() STARTED
[2021-10-07T18:49:27.327Z] 
[2021-10-07T18:49:27.327Z] PlaintextConsumerTest > testCommitSpecifiedOffsets() 
PASSED
[2021-10-07T18:49:27.327Z] 
[2021-10-07T18:49:27.327Z] PlaintextConsumerTest > 
testPerPartitionLeadMetricsCleanUpWithSubscribe() STARTED
[2021-10-07T18:49:27.849Z] 
[2021-10-07T18:49:27.849Z] PlaintextConsumerTest > testPartitionsFor() PASSED
[2021-10-07T18:49:27.849Z] 
[2021-10-07T18:49:27.850Z] PlaintextConsumerTest > 
testMultiConsumerDefaultAssignorAndVerifyAssignment() STARTED
[2021-10-07T18:49:32.534Z] 
[2021-10-07T18:49:32.534Z] PlaintextConsumerTest > 
testPerPartitionLeadMetricsCleanUpWithSubscribe() PASSED
[2021-10-07T18:49:32.534Z] 
[2021-10-07T18:49:32.534Z] PlaintextConsumerTest > testCommitMetadata() STARTED
[2021-10-07T18:49:33.055Z] 
[2021-10-07T18:49:33.055Z] PlaintextConsumerTest > 
testMultiConsumerDefaultAssignorAndVerifyAssignment() PASSED
[2021-10-07T18:49:33.055Z] 
[2021-10-07T18:49:33.055Z] PlaintextConsumerTest > testAutoCommitOnRebalance() 
STARTED
[2021-10-07T18:49:37.741Z] 
[2021-10-07T18:49:37.741Z] PlaintextConsumerTest > testCommitMetadata() PASSED
[2021-10-07T18:49:37.741Z] 
[2021-10-07T18:49:37.741Z] PlaintextConsumerTest > testRoundRobinAssignment() 
STARTED
[2021-10-07T18:49:39.656Z] 
[2021-10-07T18:49:39.656Z] PlaintextConsumerTest > testAutoCommitOnRebalance() 
PASSED
[2021-10-07T18:49:39.656Z] 
[2021-10-07T18:49:39.656Z] PlaintextConsumerTest > 
testInterceptorsWithWrongKeyValue() STARTED
[2021-10-07T18:49:43.819Z] 
[2021-10-07T18:49:43.819Z] PlaintextConsumerTest > 
testInterceptorsWithWrongKeyValue() PASSED
[2021-10-07T18:49:43.819Z] 
[2021-10-07T18:49:43.819Z] PlaintextConsumerTest > 
testPerPartitionLeadWithMaxPollRecords() STARTED
[2021-10-07T18:49:47.620Z] 
[2021-10-07T18:49:47.620Z] PlaintextConsumerTest > testRoundRobinAssignment() 
PASSED
[2021-10-07T18:49:47.620Z] 
[2021-10-07T18:49:47.620Z] 

Re: [DISCUSS] KIP-768: Extend SASL/OAUTHBEARER with Support for OIDC

2021-10-07 Thread Kirk True
Hi Rajini,

I've updated the KIP with your feedback. Let me know if there's anything still 
amiss.

Thanks,
Kirk

On Wed, Oct 6, 2021, at 5:27 PM, Kirk True wrote:
> Hi Rajini,
> 
> Thank you very much for your in-depth review! You highlighted a lot of dark 
> corners :) 
> 
> >1. The diagram shows broker startup followed by `broker requests keys
> >from JWKS endpoint`.
> >   - Do we open broker ports only after we successfully get the keys? We
> >   need to guarantee this to ensure that clients don't see authentication
> >   failures during broker restarts.
> >   - Doesn't sound like we will persist the keys, so what is the
> >   behaviour if the OAuth server is not available? Will broker retry
> >   forever?
> 
> In the case where the OAuth provider is unavailable, is it preferable for the 
> broker to start up in a diminished capacity or to simply fail to start up at 
> all?
> 
> It's my understanding that a broker can support more than one form of 
> authentication. If so, should we continue start up if the other forms of 
> authentication are working?
> 
> >2. Client configuration includes a large number of JAAS config options
> >like `loginRetryWaitMs` and `loginRetryMaxWaitMs`. Have we considered
> >making them top-level configs instead? Not saying we should, but it will 
> > be
> >good to document why we chose to do it this way. The advantage of
> >top-level option is that it can be used for other similar login methods
> >in future. And they become visible in logs (unlike `sasl.jaas.config`
> >which is considered sensitive and hence not logged). The current
> >approach keeps all the related configs together in one place, so that may
> >be ok too, worth documenting the reasons anyway. It is useful to keep
> >credentials in `sasl.jaas.config`, it is less clear with other configs
> >(e.g. we have various `sasl.kerberos.` configs.
> 
> I can look at moving the more general, non-sensitive configuration out from 
> under the JAAS configuration. Now that you mention it, I did notice that the 
> JAAS configuration was redacted in the logs. 
> 
> >3. The extension config uses inconsistent naming `
> >Extension_supportFeatureX`. If we are trying to keep this consistent
> >with the existing callback handler, should this be `
> >unsecuredLoginExtension_xxx` or otherwise `extension_xxx`?
> 
> You're right, it was a half-baked attempt at consistency with the existing 
> unsecured implementation.
> 
> I wanted to drop the "unsecuredLogin" prefix as it doesn't apply. Do you have 
> a preference for any of the following forms?
> 
> * securedLoginExtension_xxx
> * secureLoginExtension_xxx
> * loginExtension_xxx
> * extension_xxx
> 
> >4. We talk about re-authentication using KIP-368. Can we also describe
> >re-login on the client-side to acquire new tokens? That should be based 
> > on
> >expiry of the token and should happen irrespective of whether broker has
> >enabled re-authentication. The unsecured version already supports this, 
> > so
> >no additional work is necessary, worth mentioning nevertheless.
> 
> I spent more time than I'd like to admit trying to trigger a client side-only 
> refresh. While the client would refresh and grab an updated token from the 
> OAuth provider, it never seemed to trigger a call to the broker to 
> re-validate.
> 
> I'll take another look to see what I'm missing.
> 
> >5. KIP says: `A new key ID (kid) could appear in the header of an
> >incoming JWT access token. Code that can retrieve the JWKS from the OAuth
> >provider on demand will be implemented.`. What happens to the first
> >connection that requires this? Given we can't block network thread while 
> > we
> >do this network operation, will we fail authentications until we have
> >refreshed keys in the background thread?
> 
> Ugh. Another good catch :)
> 
> There are a few cases related to the timing of a new key ID being published. 
> I'm going to try to make this sound all formal, but hopefully it doesn't just 
> come off confusing :)
> 
> Let A = the time that the OAuth provider publishes the updated JWKS with the 
> new key ID.
> 
> Let B = the time that the broker's internal key cache refresh is run.
> 
> Let C = the time that the OAuth provider issues a JWT with a new key ID.
> 
> Here are the timing cases:
> 
> 1. A < B < C. This is the case where the JWKS publish time is far enough in 
> advance of first JWT issuance that our cache has had a chance to run and is 
> then fully refreshed and ready for the key. This is the optimal case.
> 2. A < C < B. This is the case where the JWKS publish time happens before JWT 
> issuance, but after our last cache refresh. This is the case referred to in 
> the KIP when it says that the broker would block to look up the JWKS.
> 3. C < A < B. This is the case where the JWKS publish time is *after* the JWT 
> issuance. I would hope this "should 

Re: CVE Back Port?

2021-10-07 Thread Ismael Juma
Hi Mickael,

That issue was more severe and we decided to go beyond what we would
normally do. Having said that, you are welcome to drive the releases if you
have the cycles. My general advice stands, a bunch of open-source
dependencies have CVEs regularly so it's best to stick with one of the two
most recent releases. We take compatibility very seriously to make it easy
for people to upgrade within a major version.

Ismael

On Thu, Oct 7, 2021 at 6:54 AM Mickael Maison 
wrote:

> Hi Ismael,
>
> While we only produce releases for the 2 most recent branches, many
> users are still running older releases such as 2.6 and 2.7.
>
> In the past, for security issues we produced releases for older
> versions too. For example, for CVE-2018-1288, we released 0.10.2.2,
> 0.11.0.3, 1.1.0 and 1.1.0.
>
> I think there is value in releasing the 2.6.3 and 2.7.2 bugfix
> releases. In addition of the fix for this CVE, 2.6 has 11 unreleased
> fixes and 2.7 has 26.
>
> If nobody objects, I'm happy to run these 2 releases.
>
> Thanks
>
>
> On Wed, Oct 6, 2021 at 4:26 PM Ismael Juma  wrote:
> >
> > Hi Gary,
> >
> > The change has been backported to all the relevant branches. However,
> > Apache Kafka produces releases from the two most recent branches. The fix
> > is on the server side (broker and connect). I would encourage you to
> change
> > your rules since clients maintain compatibility for all public apis in
> > minor releases.
> >
> > Ismael
> >
> > On Tue, Oct 5, 2021 at 11:56 AM Gary Russell 
> wrote:
> >
> > > Is there any chance that the fix for this CVE [1] can be back ported
> (and
> > > released) on the 2.5, 2.6 and 2.7 branches?
> > >
> > > We have 3 (soon to be 4) supported branches, based on the 2.5.x, 2.6.x,
> > > 2.7.x, (and soon 3.0.0) clients.
> > >
> > > Our versioning rules forbid moving to a new minor release for a
> dependency
> > > (e.g 2.7.x to 2.8.x) in a patch release.
> > >
> > > Yes, the user can override the version to 2.8.1 (works on all of our
> > > supported branches), but the problem is (s)he gets a vulnerable version
> > > transitively and has to know to do so.
> > >
> > > Or, is this CVE on the broker side only (and not on the clients)? (I
> have
> > > been unable to find the actual fix in the commit log).
> > >
> > > Thanks for your consideration.
> > >
> > > The Spring team.
> > >
> > > [1]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-38153
> > >
> > >
>


Re: [DISCUSS] KIP-714: Client metrics and observability

2021-10-07 Thread Magnus Edenhill
Hi all,

I've updated the KIP following our recent discussions on the mailing list:
 - split the protocol in two, one for getting the metrics subscriptions,
and one for pushing the metrics.
 - simplifications: initially only one supported metrics format, no
client.id in the instance id, etc.
 - made CLIENT_METRICS subscription configuration entries more structured
   and allowing better client matching selectors (not only on the instance
id, but also the other
   client resource labels, such as client_software_name, etc.).

Unless there are further comments I'll call the vote in a day or two.

Regards,
Magnus



Den mån 4 okt. 2021 kl 20:57 skrev Magnus Edenhill :

> Hi Gwen,
>
> I'm finishing up the KIP based on the last couple of discussion points in
> this thread
> and will call the Vote later this week.
>
> Best,
> Magnus
>
> Den lör 2 okt. 2021 kl 02:01 skrev Gwen Shapira  >:
>
>> Hey,
>>
>> I noticed that there was no discussion for the last 10 days, but I
>> couldn't
>> find the vote thread. Is there one that I'm missing?
>>
>> Gwen
>>
>> On Wed, Sep 22, 2021 at 4:58 AM Magnus Edenhill 
>> wrote:
>>
>> > Den tis 21 sep. 2021 kl 06:58 skrev Colin McCabe :
>> >
>> > > On Mon, Sep 20, 2021, at 17:35, Feng Min wrote:
>> > > > Thanks Magnus & Colin for the discussion.
>> > > >
>> > > > Based on KIP-714's stateless design, Client can pretty much use any
>> > > > connection to any broker to send metrics. We are not associating
>> > > connection
>> > > > with client metric state. Is my understanding correct? If yes,  how
>> > about
>> > > > the following two scenarios
>> > > >
>> > > > 1) One Client (Client-ID) registers two different client instance id
>> > via
>> > > > separate registration. Is it permitted? If OK, how to distinguish
>> them
>> > > from
>> > > > the case 2 below.
>> > > >
>> > >
>> > > Hi Feng,
>> > >
>> > > My understanding, which Magnus can clarify I guess, is that you could
>> > have
>> > > something like two Producer instances running with the same client.id
>> > > (perhaps because they're using the same config file, for example).
>> They
>> > > could even be in the same process. But they would get separate UUIDs.
>> > >
>> > > I believe Magnus used the term client to mean "Producer or Consumer".
>> So
>> > > if you have both a Producer and a Consumer in your application I would
>> > > expect you'd get separate UUIDs for both. Again Magnus can chime in
>> > here, I
>> > > guess.
>> > >
>> >
>> > That's correct.
>> >
>> >
>> > >
>> > > > 2) How about the client restarting? What's the expectation? Should
>> the
>> > > > server expect the client to carry a persisted client instance id or
>> > > should
>> > > > the client be treated as a new instance?
>> > >
>> > > The KIP doesn't describe any mechanism for persistence, so I would
>> assume
>> > > that when you restart the client you get a new UUID. I agree that it
>> > would
>> > > be good to spell this out.
>> > >
>> > >
>> > Right, it will not be persisted since a client instance can't be
>> restarted.
>> >
>> > Will update the KIP to make this clearer.
>> >
>> > /Magnus
>> >
>>
>>
>> --
>> Gwen Shapira
>> Engineering Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter | blog
>>
>


Re: CVE Back Port?

2021-10-07 Thread Gary Russell
Hi Mikael,

That would be much appreciated; I doubt that I can change a policy that has 
been in effect for many years.

Our version that uses the 2.5.x clients is already out of OSS support (and goes 
out of commercial support early next year).

So 2.6.x and 2.7.x versions would be fine (for us).

Many thanks for your consideration.

Gary

From: Mickael Maison 
Sent: Thursday, October 7, 2021 9:53 AM
To: dev 
Subject: Re: CVE Back Port?

Hi Ismael,

While we only produce releases for the 2 most recent branches, many
users are still running older releases such as 2.6 and 2.7.

In the past, for security issues we produced releases for older
versions too. For example, for CVE-2018-1288, we released 0.10.2.2,
0.11.0.3, 1.1.0 and 1.1.0.

I think there is value in releasing the 2.6.3 and 2.7.2 bugfix
releases. In addition of the fix for this CVE, 2.6 has 11 unreleased
fixes and 2.7 has 26.

If nobody objects, I'm happy to run these 2 releases.

Thanks


On Wed, Oct 6, 2021 at 4:26 PM Ismael Juma  wrote:
>
> Hi Gary,
>
> The change has been backported to all the relevant branches. However,
> Apache Kafka produces releases from the two most recent branches. The fix
> is on the server side (broker and connect). I would encourage you to change
> your rules since clients maintain compatibility for all public apis in
> minor releases.
>
> Ismael
>
> On Tue, Oct 5, 2021 at 11:56 AM Gary Russell  wrote:
>
> > Is there any chance that the fix for this CVE [1] can be back ported (and
> > released) on the 2.5, 2.6 and 2.7 branches?
> >
> > We have 3 (soon to be 4) supported branches, based on the 2.5.x, 2.6.x,
> > 2.7.x, (and soon 3.0.0) clients.
> >
> > Our versioning rules forbid moving to a new minor release for a dependency
> > (e.g 2.7.x to 2.8.x) in a patch release.
> >
> > Yes, the user can override the version to 2.8.1 (works on all of our
> > supported branches), but the problem is (s)he gets a vulnerable version
> > transitively and has to know to do so.
> >
> > Or, is this CVE on the broker side only (and not on the clients)? (I have
> > been unable to find the actual fix in the commit log).
> >
> > Thanks for your consideration.
> >
> > The Spring team.
> >
> > [1]: 
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcve.mitre.org%2Fcgi-bin%2Fcvename.cgi%3Fname%3DCVE-2021-38153data=04%7C01%7Cgrussell%40vmware.com%7C0ae6afe55e0944f125a808d98999eb17%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637692116441312807%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=wa7jebgyHql1fQ1YEOunNsqwVe%2B19cgNIz3ihG1RyHU%3Dreserved=0
> >
> >


Re: CVE Back Port?

2021-10-07 Thread Mickael Maison
Hi Ismael,

While we only produce releases for the 2 most recent branches, many
users are still running older releases such as 2.6 and 2.7.

In the past, for security issues we produced releases for older
versions too. For example, for CVE-2018-1288, we released 0.10.2.2,
0.11.0.3, 1.1.0 and 1.1.0.

I think there is value in releasing the 2.6.3 and 2.7.2 bugfix
releases. In addition of the fix for this CVE, 2.6 has 11 unreleased
fixes and 2.7 has 26.

If nobody objects, I'm happy to run these 2 releases.

Thanks


On Wed, Oct 6, 2021 at 4:26 PM Ismael Juma  wrote:
>
> Hi Gary,
>
> The change has been backported to all the relevant branches. However,
> Apache Kafka produces releases from the two most recent branches. The fix
> is on the server side (broker and connect). I would encourage you to change
> your rules since clients maintain compatibility for all public apis in
> minor releases.
>
> Ismael
>
> On Tue, Oct 5, 2021 at 11:56 AM Gary Russell  wrote:
>
> > Is there any chance that the fix for this CVE [1] can be back ported (and
> > released) on the 2.5, 2.6 and 2.7 branches?
> >
> > We have 3 (soon to be 4) supported branches, based on the 2.5.x, 2.6.x,
> > 2.7.x, (and soon 3.0.0) clients.
> >
> > Our versioning rules forbid moving to a new minor release for a dependency
> > (e.g 2.7.x to 2.8.x) in a patch release.
> >
> > Yes, the user can override the version to 2.8.1 (works on all of our
> > supported branches), but the problem is (s)he gets a vulnerable version
> > transitively and has to know to do so.
> >
> > Or, is this CVE on the broker side only (and not on the clients)? (I have
> > been unable to find the actual fix in the commit log).
> >
> > Thanks for your consideration.
> >
> > The Spring team.
> >
> > [1]: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-38153
> >
> >


[jira] [Created] (KAFKA-13359) Round Robin Kafka Producer Routes to only half the partitions when even number of partitions

2021-10-07 Thread David G (Jira)
David G created KAFKA-13359:
---

 Summary: Round Robin Kafka Producer Routes to only half the 
partitions when even number of partitions
 Key: KAFKA-13359
 URL: https://issues.apache.org/jira/browse/KAFKA-13359
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 3.0.0, 2.7.0
Reporter: David G


When you have 1 message per batch, in the round robin Partitioner. The messages 
go only to half the partitions beacuse it always skips 1 partition. This works 
out for odd number of partitions because skipping 1 will mean all partitions 
get a hit eventually, but with an even number half the partitions never get 
selected.

 

So if you have partitions 1, 2, 3, 4 

Message 1: Partion 1

Message 2: Partion 3

Message 3: Partion 1 ... so on. [ Here 2 and 4 are never selected]

 

If you have partitions 1, 2, 3

Message 1: Partion 1

Message 2: Partion 3

Message 3: Partion 2 .. so on

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13358) Not able to replicate groups in MirrorMaker 2.0

2021-10-07 Thread Hemanth Savasere (Jira)
Hemanth Savasere created KAFKA-13358:


 Summary: Not able to replicate groups in MirrorMaker 2.0 
 Key: KAFKA-13358
 URL: https://issues.apache.org/jira/browse/KAFKA-13358
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 2.5.0
 Environment: RHEL
Reporter: Hemanth Savasere
 Attachments: connect-mirror-maker.properties

I had created a *group PizzaGroup from kafka-console-consumer *so that I can 
read the content of the *Pizza* topic in the *source* cluster. 

Now I want to replicate the *same group to the destination* cluster.

Even though I have left the *groups.blacklist as empty* in the 
*connect-mirror-maker.properties*, even then the groups are not getting 
replicated. The properties file has been attached for reference.

We are using SSL protocol.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)