[ 
https://issues.apache.org/jira/browse/KAFKA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425756#comment-15425756
 ] 

Todd Palino commented on KAFKA-4050:
------------------------------------

So first off, yes, the thread dump (which [~jjkoshy] posted) shows that the 
offending line of code is "NativePRNG.java:481". I checked, and that's very 
clearly in the non-blocking NativePRNG variant that explictly uses /dev/urandom.

I had considered changing the default, [~ijuma], and I actually thought about 
adding a note to this ticket about it earlier today. Despite the fact that the 
default clearly has performance issues, I don't think we should change the 
default behavior, which is to let the JRE pick the PRNG implementation. The 
reason is that we can't be sure that on any given system, in any given JRE, 
that the new one we set explicitly will exist, and that would cause the default 
behavior to break. The SHA1PRNG implementation should exist everywhere, but I'd 
rather not take the risk. I think it's better to leave the default as is, and 
call out the issue very clearly in the documentation.

> Allow configuration of the PRNG used for SSL
> --------------------------------------------
>
>                 Key: KAFKA-4050
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4050
>             Project: Kafka
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 0.10.0.1
>            Reporter: Todd Palino
>            Assignee: Todd Palino
>              Labels: security, ssl
>
> This change will make the pseudo-random number generator (PRNG) 
> implementation used by the SSLContext configurable. The configuration is not 
> required, and the default is to use whatever the default PRNG for the JDK/JRE 
> is. Providing a string, such as "SHA1PRNG", will cause that specific 
> SecureRandom implementation to get passed to the SSLContext.
> When enabling inter-broker SSL in our certification cluster, we observed 
> severe performance issues. For reference, this cluster can take up to 600 
> MB/sec of inbound produce traffic over SSL, with RF=2, before it gets close 
> to saturation, and the mirror maker normally produces about 400 MB/sec 
> (unless it is lagging). When we enabled inter-broker SSL, we saw persistent 
> replication problems in the cluster at any inbound rate of more than about 6 
> or 7 MB/sec per-broker. This was narrowed down to all the network threads 
> blocking on a single lock in the SecureRandom code.
> It turns out that the default PRNG implementation on Linux is NativePRNG. 
> This uses randomness from /dev/urandom (which, by itself, is a non-blocking 
> read) and mixes it with randomness from SHA1. The problem is that the entire 
> application shares a single SecureRandom instance, and NativePRNG has a 
> global lock within the implNextBytes method. Switching to another 
> implementation (SHA1PRNG, which has better performance characteristics and is 
> still considered secure) completely eliminated the bottleneck and allowed the 
> cluster to work properly at saturation.
> The SSLContext initialization has an optional argument to provide a 
> SecureRandom instance, which the code currently sets to null. This change 
> creates a new config to specify an implementation, and instantiates that and 
> passes it to SSLContext if provided. This will also let someone select a 
> stronger source of randomness (obviously at a performance cost) if desired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to