[ https://issues.apache.org/jira/browse/KAFKA-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yu Yang updated KAFKA-7364: --------------------------- Description: while testing ssl writing to kafka, we found that kafka often run into high cpu usage due to inefficiency in jdk ssl implementation. In detail, we use a test cluster that have 12 d2.8xlarge instances, jdk-10.0.2, and hosts only one topic that have ~20k producers write to through ssl channel. We observed that the network threads often get 100% cpu usage after enabling ssl writing to kafka. To improve kafka's throughput, we have "num.network.threads=32" for the broker. Even with 32 network threads, we see the broker cpu usage jump right after ssl writing is enabled. The broker's cpu usage would drop immediately when we disabled ssl writing. !Screen Shot 2018-08-30 at 10.57.32 PM.png! When the broker's cpu usage is high, 'perf top' shows that kafka is busy with executing code in libsunec.so. The following is a sample stack track that we get when the broker's cpu usage was high. {code} Thread 77562: (state = IN_NATIVE) - sun.security.ec.ECDSASignature.verifySignedDigest(byte[], byte[], byte[], byte[]) @bci=0 (Compiled frame; information may be imprecise) - sun.security.ec.ECDSASignature.engineVerify(byte[]) @bci=70, line=321 (Compiled frame) - java.security.Signature$Delegate.engineVerify(byte[]) @bci=9, line=1222 (Compiled frame) - java.security.Signature.verify(byte[]) @bci=10, line=655 (Compiled frame) - sun.security.x509.X509CertImpl.verify(java.security.PublicKey, java.lang.String) @bci=136, line=444 (Compiled frame) - sun.security.provider.certpath.BasicChecker.verifySignature(java.security.cert.X509Certificate) @bci=48, line=166 (Compiled frame) - sun.security.provider.certpath.BasicChecker.check(java.security.cert.Certificate, java.util.Collection) @bci=24, line=147 (Compiled frame) - sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(java.security.cert.CertPath, java.util.List, java.util.List) @bci=316, line=125 (Compiled frame) - sun.security.provider.certpath.PKIXCertPathValidator.validate(java.security.cert.TrustAnchor, sun.security.provider.certpath.PKIX$ValidatorParams) @bci=390, line=233 (Compiled frame) - sun.security.provider.certpath.PKIXCertPathValidator.validate(sun.security.provider.certpath.PKIX$ValidatorParams) @bci=217, line=141 (Compiled frame) - sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(java.security.cert.CertPath, java.security.cert.CertPathParameters) @bci=7, line=80 (Compiled frame) - java.security.cert.CertPathValidator.validate(java.security.cert.CertPath, java.security.cert.CertPathParameters) @bci=6, line=292 (Compiled frame) - sun.security.validator.PKIXValidator.doValidate(java.security.cert.X509Certificate[], java.security.cert.PKIXBuilderParameters) @bci=34, line=357 (Compiled frame) - sun.security.validator.PKIXValidator.engineValidate(java.security.cert.X509Certificate[], java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) @bci=232, line=259 (Compiled frame) - sun.security.validator.Validator.validate(java.security.cert.X509Certificate[], java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) @bci=6, line=260 (Compiled frame) - sun.security.ssl.X509TrustManagerImpl.validate(sun.security.validator.Validator, java.security.cert.X509Certificate[], java.security.AlgorithmConstraints, java.lang.String) @bci=10, line=324 (Compiled frame) - sun.security.ssl.X509TrustManagerImpl.checkTrusted(java.security.cert.X509Certificate[], java.lang.String, javax.net.ssl.SSLEngine, boolean) @bci=179, line=279 (Compiled frame) - sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(java.security.cert.X509Certificate[], java.lang.String, javax.net.ssl.SSLEngine) @bci=5, line=130 (Compiled frame) - sun.security.ssl.ServerHandshaker.clientCertificate(sun.security.ssl.HandshakeMessage$CertificateMsg) @bci=190, line=1966 (Compiled frame) - sun.security.ssl.ServerHandshaker.processMessage(byte, int) @bci=160, line=237 (Compiled frame) - sun.security.ssl.Handshaker.processLoop() @bci=96, line=1052 (Compiled frame) - sun.security.ssl.Handshaker$1.run() @bci=4, line=992 (Compiled frame) - sun.security.ssl.Handshaker$1.run() @bci=1, line=989 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - sun.security.ssl.Handshaker$DelegatedTask.run() @bci=24, line=1467 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks() @bci=13, line=393 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(boolean) @bci=88, line=473 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.doHandshake() @bci=570, line=331 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.handshake() @bci=48, line=258 (Compiled frame) - org.apache.kafka.common.network.KafkaChannel.prepare() @bci=16, line=125 (Compiled frame) - org.apache.kafka.common.network.Selector.pollSelectionKeys(java.util.Set, boolean, long) @bci=251, line=494 (Compiled frame) - org.apache.kafka.common.network.Selector.poll(long) @bci=312, line=432 (Compiled frame) - kafka.network.Processor.poll() @bci=7, line=678 (Compiled frame) - kafka.network.Processor.run() @bci=31, line=583 (Compiled frame) - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame) {code} was: while testing ssl writing to kafka, we found that kafka often run into high cpu usage due to inefficiency in jdk ssl implementation. In detail, we use a test cluster that have 12 d2.8xlarge instances, jdk-10.0.2, and hosts only one topic that have ~20k producers write to through ssl channel. We observed that the network threads often get 100% cpu usage after enabling ssl writing to kafka. To improve kafka's throughput, we have "num.network.threads=32" for the broker. Even with 32 network threads, we see the broker cpu usage jump right after ssl writing is enabled. !Screen Shot 2018-08-30 at 10.57.32 PM.png! When the broker's cpu usage is high, 'perf top' shows that kafka is busy with executing code in libsunec.so. The following is a sample stack track that we get when the broker's cpu usage was high. {code} Thread 77562: (state = IN_NATIVE) - sun.security.ec.ECDSASignature.verifySignedDigest(byte[], byte[], byte[], byte[]) @bci=0 (Compiled frame; information may be imprecise) - sun.security.ec.ECDSASignature.engineVerify(byte[]) @bci=70, line=321 (Compiled frame) - java.security.Signature$Delegate.engineVerify(byte[]) @bci=9, line=1222 (Compiled frame) - java.security.Signature.verify(byte[]) @bci=10, line=655 (Compiled frame) - sun.security.x509.X509CertImpl.verify(java.security.PublicKey, java.lang.String) @bci=136, line=444 (Compiled frame) - sun.security.provider.certpath.BasicChecker.verifySignature(java.security.cert.X509Certificate) @bci=48, line=166 (Compiled frame) - sun.security.provider.certpath.BasicChecker.check(java.security.cert.Certificate, java.util.Collection) @bci=24, line=147 (Compiled frame) - sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(java.security.cert.CertPath, java.util.List, java.util.List) @bci=316, line=125 (Compiled frame) - sun.security.provider.certpath.PKIXCertPathValidator.validate(java.security.cert.TrustAnchor, sun.security.provider.certpath.PKIX$ValidatorParams) @bci=390, line=233 (Compiled frame) - sun.security.provider.certpath.PKIXCertPathValidator.validate(sun.security.provider.certpath.PKIX$ValidatorParams) @bci=217, line=141 (Compiled frame) - sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(java.security.cert.CertPath, java.security.cert.CertPathParameters) @bci=7, line=80 (Compiled frame) - java.security.cert.CertPathValidator.validate(java.security.cert.CertPath, java.security.cert.CertPathParameters) @bci=6, line=292 (Compiled frame) - sun.security.validator.PKIXValidator.doValidate(java.security.cert.X509Certificate[], java.security.cert.PKIXBuilderParameters) @bci=34, line=357 (Compiled frame) - sun.security.validator.PKIXValidator.engineValidate(java.security.cert.X509Certificate[], java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) @bci=232, line=259 (Compiled frame) - sun.security.validator.Validator.validate(java.security.cert.X509Certificate[], java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) @bci=6, line=260 (Compiled frame) - sun.security.ssl.X509TrustManagerImpl.validate(sun.security.validator.Validator, java.security.cert.X509Certificate[], java.security.AlgorithmConstraints, java.lang.String) @bci=10, line=324 (Compiled frame) - sun.security.ssl.X509TrustManagerImpl.checkTrusted(java.security.cert.X509Certificate[], java.lang.String, javax.net.ssl.SSLEngine, boolean) @bci=179, line=279 (Compiled frame) - sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(java.security.cert.X509Certificate[], java.lang.String, javax.net.ssl.SSLEngine) @bci=5, line=130 (Compiled frame) - sun.security.ssl.ServerHandshaker.clientCertificate(sun.security.ssl.HandshakeMessage$CertificateMsg) @bci=190, line=1966 (Compiled frame) - sun.security.ssl.ServerHandshaker.processMessage(byte, int) @bci=160, line=237 (Compiled frame) - sun.security.ssl.Handshaker.processLoop() @bci=96, line=1052 (Compiled frame) - sun.security.ssl.Handshaker$1.run() @bci=4, line=992 (Compiled frame) - sun.security.ssl.Handshaker$1.run() @bci=1, line=989 (Compiled frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Compiled frame) - sun.security.ssl.Handshaker$DelegatedTask.run() @bci=24, line=1467 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks() @bci=13, line=393 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(boolean) @bci=88, line=473 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.doHandshake() @bci=570, line=331 (Compiled frame) - org.apache.kafka.common.network.SslTransportLayer.handshake() @bci=48, line=258 (Compiled frame) - org.apache.kafka.common.network.KafkaChannel.prepare() @bci=16, line=125 (Compiled frame) - org.apache.kafka.common.network.Selector.pollSelectionKeys(java.util.Set, boolean, long) @bci=251, line=494 (Compiled frame) - org.apache.kafka.common.network.Selector.poll(long) @bci=312, line=432 (Compiled frame) - kafka.network.Processor.poll() @bci=7, line=678 (Compiled frame) - kafka.network.Processor.run() @bci=31, line=583 (Compiled frame) - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame) {code} > kafka periodically run into high cpu usage with ssl writing > ----------------------------------------------------------- > > Key: KAFKA-7364 > URL: https://issues.apache.org/jira/browse/KAFKA-7364 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.0.0 > Reporter: Yu Yang > Priority: Major > Attachments: Screen Shot 2018-08-30 at 10.57.32 PM.png > > > while testing ssl writing to kafka, we found that kafka often run into high > cpu usage due to inefficiency in jdk ssl implementation. > In detail, we use a test cluster that have 12 d2.8xlarge instances, > jdk-10.0.2, and hosts only one topic that have ~20k producers write to > through ssl channel. We observed that the network threads often get 100% cpu > usage after enabling ssl writing to kafka. To improve kafka's throughput, > we have "num.network.threads=32" for the broker. Even with 32 network > threads, we see the broker cpu usage jump right after ssl writing is enabled. > The broker's cpu usage would drop immediately when we disabled ssl writing. > !Screen Shot 2018-08-30 at 10.57.32 PM.png! > When the broker's cpu usage is high, 'perf top' shows that kafka is busy with > executing code in libsunec.so. The following is a sample stack track that > we get when the broker's cpu usage was high. > {code} > Thread 77562: (state = IN_NATIVE) > - sun.security.ec.ECDSASignature.verifySignedDigest(byte[], byte[], byte[], > byte[]) @bci=0 (Compiled frame; information may be imprecise) > - sun.security.ec.ECDSASignature.engineVerify(byte[]) @bci=70, line=321 > (Compiled frame) > - java.security.Signature$Delegate.engineVerify(byte[]) @bci=9, line=1222 > (Compiled frame) > - java.security.Signature.verify(byte[]) @bci=10, line=655 (Compiled frame) > - sun.security.x509.X509CertImpl.verify(java.security.PublicKey, > java.lang.String) @bci=136, line=444 (Compiled frame) > - > sun.security.provider.certpath.BasicChecker.verifySignature(java.security.cert.X509Certificate) > @bci=48, line=166 (Compiled frame) > - > sun.security.provider.certpath.BasicChecker.check(java.security.cert.Certificate, > java.util.Collection) @bci=24, line=147 (Compiled frame) > - > sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(java.security.cert.CertPath, > java.util.List, java.util.List) @bci=316, line=125 (Compiled frame) > - > sun.security.provider.certpath.PKIXCertPathValidator.validate(java.security.cert.TrustAnchor, > sun.security.provider.certpath.PKIX$ValidatorParams) @bci=390, line=233 > (Compiled frame) > - > sun.security.provider.certpath.PKIXCertPathValidator.validate(sun.security.provider.certpath.PKIX$ValidatorParams) > @bci=217, line=141 (Compiled frame) > - > sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(java.security.cert.CertPath, > java.security.cert.CertPathParameters) @bci=7, line=80 (Compiled frame) > - java.security.cert.CertPathValidator.validate(java.security.cert.CertPath, > java.security.cert.CertPathParameters) @bci=6, line=292 (Compiled frame) > - > sun.security.validator.PKIXValidator.doValidate(java.security.cert.X509Certificate[], > java.security.cert.PKIXBuilderParameters) @bci=34, line=357 (Compiled frame) > - > sun.security.validator.PKIXValidator.engineValidate(java.security.cert.X509Certificate[], > java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) > @bci=232, line=259 (Compiled frame) > - > sun.security.validator.Validator.validate(java.security.cert.X509Certificate[], > java.util.Collection, java.security.AlgorithmConstraints, java.lang.Object) > @bci=6, line=260 (Compiled frame) > - > sun.security.ssl.X509TrustManagerImpl.validate(sun.security.validator.Validator, > java.security.cert.X509Certificate[], java.security.AlgorithmConstraints, > java.lang.String) @bci=10, line=324 (Compiled frame) > - > sun.security.ssl.X509TrustManagerImpl.checkTrusted(java.security.cert.X509Certificate[], > java.lang.String, javax.net.ssl.SSLEngine, boolean) @bci=179, line=279 > (Compiled frame) > - > sun.security.ssl.X509TrustManagerImpl.checkClientTrusted(java.security.cert.X509Certificate[], > java.lang.String, javax.net.ssl.SSLEngine) @bci=5, line=130 (Compiled frame) > - > sun.security.ssl.ServerHandshaker.clientCertificate(sun.security.ssl.HandshakeMessage$CertificateMsg) > @bci=190, line=1966 (Compiled frame) > - sun.security.ssl.ServerHandshaker.processMessage(byte, int) @bci=160, > line=237 (Compiled frame) > - sun.security.ssl.Handshaker.processLoop() @bci=96, line=1052 (Compiled > frame) > - sun.security.ssl.Handshaker$1.run() @bci=4, line=992 (Compiled frame) > - sun.security.ssl.Handshaker$1.run() @bci=1, line=989 (Compiled frame) > - > java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, > java.security.AccessControlContext) @bci=0 (Compiled frame) > - sun.security.ssl.Handshaker$DelegatedTask.run() @bci=24, line=1467 > (Compiled frame) > - org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks() > @bci=13, line=393 (Compiled frame) > - org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(boolean) > @bci=88, line=473 (Compiled frame) > - org.apache.kafka.common.network.SslTransportLayer.doHandshake() @bci=570, > line=331 (Compiled frame) > - org.apache.kafka.common.network.SslTransportLayer.handshake() @bci=48, > line=258 (Compiled frame) > - org.apache.kafka.common.network.KafkaChannel.prepare() @bci=16, line=125 > (Compiled frame) > - org.apache.kafka.common.network.Selector.pollSelectionKeys(java.util.Set, > boolean, long) @bci=251, line=494 (Compiled frame) > - org.apache.kafka.common.network.Selector.poll(long) @bci=312, line=432 > (Compiled frame) > - kafka.network.Processor.poll() @bci=7, line=678 (Compiled frame) > - kafka.network.Processor.run() @bci=31, line=583 (Compiled frame) > - java.lang.Thread.run() @bci=11, line=748 (Interpreted frame) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)