[jira] [Created] (KAFKA-12605) kafka consumer churns through buffer memory iterating over records
radai rosenblatt created KAFKA-12605: Summary: kafka consumer churns through buffer memory iterating over records Key: KAFKA-12605 URL: https://issues.apache.org/jira/browse/KAFKA-12605 Project: Kafka Issue Type: Improvement Affects Versions: 2.7.0 Reporter: radai rosenblatt we recently conducted analysis on memory allocations by the kafka consumer and found a significant amount of buffers that graduate out of the young gen causing GC load. these are tthe buffers used to gunzip record batches in the consumer when polling. since the same iterator (and underlying streams and buffers) are likely to live through several poll() cycles these buffers graduate out of young gen and cause issues. see attached memory allocation flame graph. the code causing this is in CompressionTypye.GZIP (taken from current trunk): {code:java} @Override public InputStream wrapForInput(ByteBuffer buffer, byte messageVersion, BufferSupplier decompressionBufferSupplier) { try { // Set output buffer (uncompressed) to 16 KB (none by default) and input buffer (compressed) to // 8 KB (0.5 KB by default) to ensure reasonable performance in cases where the caller reads a small // number of bytes (potentially a single byte) return new BufferedInputStream(new GZIPInputStream(new ByteBufferInputStream(buffer), 8 * 1024), 16 * 1024); } catch (Exception e) { throw new KafkaException(e); } }{code} it allocated 2 buffers - 8K and 16K even though a BufferSupplier is available to attempt re-use. i believe it is possible to actually get both tthose buffers from the supplier, and return them when iteration over the record batch is done. doing so will require subclassing BufferedInputStream and GZIPInputStream (or its parent class) to allow supplying external buffers onto them. also some lifecycle hook would be needed to return said buffers to the pool when iteration is done. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9998) KafkaProducer.close(timeout) still may block indefinitely
radai rosenblatt created KAFKA-9998: --- Summary: KafkaProducer.close(timeout) still may block indefinitely Key: KAFKA-9998 URL: https://issues.apache.org/jira/browse/KAFKA-9998 Project: Kafka Issue Type: Bug Affects Versions: 2.4.1 Reporter: radai rosenblatt looking at KafkaProducer.close(timeout), we have this: {code:java} private void close(Duration timeout, boolean swallowException) { long timeoutMs = timeout.toMillis(); if (timeoutMs < 0) throw new IllegalArgumentException("The timeout cannot be negative."); log.info("Closing the Kafka producer with timeoutMillis = {} ms.", timeoutMs); // this will keep track of the first encountered exception AtomicReference firstException = new AtomicReference<>(); boolean invokedFromCallback = Thread.currentThread() == this.ioThread; if (timeoutMs > 0) { if (invokedFromCallback) { log.warn("Overriding close timeout {} ms to 0 ms in order to prevent useless blocking due to self-join. " + "This means you have incorrectly invoked close with a non-zero timeout from the producer call-back.", timeoutMs); } else { // Try to close gracefully. if (this.sender != null) this.sender.initiateClose(); if (this.ioThread != null) { try { this.ioThread.join(timeoutMs);< GRACEFUL JOIN } catch (InterruptedException t) { firstException.compareAndSet(null, new InterruptException(t)); log.error("Interrupted while joining ioThread", t); } } } } if (this.sender != null && this.ioThread != null && this.ioThread.isAlive()) { log.info("Proceeding to force close the producer since pending requests could not be completed " + "within timeout {} ms.", timeoutMs); this.sender.forceClose(); // Only join the sender thread when not calling from callback. if (!invokedFromCallback) { try { this.ioThread.join(); <- UNBOUNDED JOIN } catch (InterruptedException e) { firstException.compareAndSet(null, new InterruptException(e)); } } } ... } {code} specifically in our case the ioThread was running a (very) long running user-provided callback which was preventing the producer from closing within the given timeout. I think the 2nd join() call should either be _VERY_ short (since we're already past the timeout at that stage) ir should not happen at all. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9855) dont waste memory allocating Struct and values objects for Schemas with no fields
radai rosenblatt created KAFKA-9855: --- Summary: dont waste memory allocating Struct and values objects for Schemas with no fields Key: KAFKA-9855 URL: https://issues.apache.org/jira/browse/KAFKA-9855 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 2.4.1, 2.4.0 Reporter: radai rosenblatt Assignee: radai rosenblatt at the time of this writing there are 6 schemas in kafka APIs with no fields - 3 versions each of LIST_GROUPS and API_VERSIONS. under some workloads this may result in the creation of a lot of Struct objects with an Object[0] for values when deserializing those requests from the wire. in one particular heap dump we've found a significant amount of heap space wasted on creating such objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-8366) partitions of topics being deleted show up in the offline partitions metric
radai rosenblatt created KAFKA-8366: --- Summary: partitions of topics being deleted show up in the offline partitions metric Key: KAFKA-8366 URL: https://issues.apache.org/jira/browse/KAFKA-8366 Project: Kafka Issue Type: Improvement Reporter: radai rosenblatt i believe this is a bug offline partitions is a metric that indicates an error condition - lack of kafka availability. as an artifact of how deletion is implemented the partitions for a topic undergoing deletion will show up as offline, which just creates false-positive alerts. if needed, maybe there should exist a separate "partitions to be deleted" sensor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7475) print the actual cluster bootstrap address on authentication failures
[ https://issues.apache.org/jira/browse/KAFKA-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt resolved KAFKA-7475. - Resolution: Fixed Reviewer: Rajini Sivaram Fix Version/s: 2.1.0 > print the actual cluster bootstrap address on authentication failures > - > > Key: KAFKA-7475 > URL: https://issues.apache.org/jira/browse/KAFKA-7475 > Project: Kafka > Issue Type: Improvement > Reporter: radai rosenblatt > Assignee: radai rosenblatt >Priority: Major > Fix For: 2.1.0 > > > currently when a kafka client fails to connect to a cluster, users see > something like this: > {code} > Connection to node -1 terminated during authentication. This may indicate > that authentication failed due to invalid credentials. > {code} > that log line is mostly useless in identifying which (of potentially many) > kafka client is having issues and what kafka cluster is it having issues with. > would be nice to record the remote host/port -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7475) print the actual cluster bootstrap address on authentication failures
radai rosenblatt created KAFKA-7475: --- Summary: print the actual cluster bootstrap address on authentication failures Key: KAFKA-7475 URL: https://issues.apache.org/jira/browse/KAFKA-7475 Project: Kafka Issue Type: Improvement Reporter: radai rosenblatt currently when a kafka client fails to connect to a cluster, users see something like this: {code} Connection to node -1 terminated during authentication. This may indicate that authentication failed due to invalid credentials. {code} that log line is mostly useless in identifying which (of potentially many) kafka client is having issues and what kafka cluster is it having issues with. would be nice to record the remote host/port -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7473) allow configuring kafka client configs to not warn for unknown config peoperties
radai rosenblatt created KAFKA-7473: --- Summary: allow configuring kafka client configs to not warn for unknown config peoperties Key: KAFKA-7473 URL: https://issues.apache.org/jira/browse/KAFKA-7473 Project: Kafka Issue Type: Improvement Reporter: radai rosenblatt as the config handed to a client may contain config keys for use by either modular code in the client (serializers, deserializers, interceptors) or the subclasses of the client class, having "unknown" (to the vanilla client) configs logged as a warning is an annoyance. it would be nice to have a constructor parameter that controls this behavior (just like there's already a flag for `boolean doLog`) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6648) Fetcher.getTopicMetadata() only returns "healthy" partitions, not all
radai rosenblatt created KAFKA-6648: --- Summary: Fetcher.getTopicMetadata() only returns "healthy" partitions, not all Key: KAFKA-6648 URL: https://issues.apache.org/jira/browse/KAFKA-6648 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 1.0.1 Reporter: radai rosenblatt Assignee: radai rosenblatt {code} if (!shouldRetry) { HashMap> topicsPartitionInfos = new HashMap<>(); for (String topic : cluster.topics()) topicsPartitionInfos.put(topic, cluster.availablePartitionsForTopic(topic)); return topicsPartitionInfos; } {code} this leads to inconsistent behavior upstream, for example in KafkaConsumer.partitionsFor(), where if there's valid metadata all partitions would be returned, whereas if MD doesnt exist (or has expired) a subset of partitions (only the healthy ones) would be returned. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6622) GroupMetadataManager.loadGroupsAndOffsets decompresses record batch needlessly
radai rosenblatt created KAFKA-6622: --- Summary: GroupMetadataManager.loadGroupsAndOffsets decompresses record batch needlessly Key: KAFKA-6622 URL: https://issues.apache.org/jira/browse/KAFKA-6622 Project: Kafka Issue Type: Bug Reporter: radai rosenblatt Assignee: radai rosenblatt Attachments: kafka batch iteration funtime.png when reading records from a consumer offsets batch, the entire batch is decompressed multiple times (per record) as part of calling `batch.baseOffset`. this is a very expensive operation being called in a loop for no reason: !kafka batch iteration funtime.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6345) NetworkClient.inFlightRequestCount() is not thread safe, causing ConcurrentModificationExceptions when sensors are read
[ https://issues.apache.org/jira/browse/KAFKA-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt resolved KAFKA-6345. - Resolution: Fixed > NetworkClient.inFlightRequestCount() is not thread safe, causing > ConcurrentModificationExceptions when sensors are read > --- > > Key: KAFKA-6345 > URL: https://issues.apache.org/jira/browse/KAFKA-6345 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 1.0.0 > Reporter: radai rosenblatt >Assignee: Sean McCauliff >Priority: Major > Fix For: 1.1.0 > > > example stack trace (code is ~0.10.2.*) > {code} > java.util.ConcurrentModificationException: > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) > at java.util.HashMap$ValueIterator.next(HashMap.java:1458) > at > org.apache.kafka.clients.InFlightRequests.inFlightRequestCount(InFlightRequests.java:109) > at > org.apache.kafka.clients.NetworkClient.inFlightRequestCount(NetworkClient.java:382) > at > org.apache.kafka.clients.producer.internals.Sender$SenderMetrics$1.measure(Sender.java:480) > at > org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:61) > at > org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:52) > at > org.apache.kafka.common.metrics.JmxReporter$KafkaMbean.getAttribute(JmxReporter.java:183) > at > org.apache.kafka.common.metrics.JmxReporter$KafkaMbean.getAttributes(JmxReporter.java:193) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(DefaultMBeanServerInterceptor.java:709) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(JmxMBeanServer.java:705) > {code} > looking at latest trunk, the code is still vulnerable: > # NetworkClient.inFlightRequestCount() eventually iterates over > InFlightRequests.requests.values(), which is backed by a (non-thread-safe) > HashMap > # this will be called from the "requests-in-flight" sensor's measure() method > (Sender.java line ~765 in SenderMetrics ctr), which would be driven by some > thread reading JMX values > # HashMap in question would also be updated by some client io thread calling > NetworkClient.doSend() - which calls into InFlightRequests.add()) > i guess the only upside is that this exception will always happen on the > thread reading the JMX values and never on the actual client io thread ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6345) NetworkClient.inFlightRequestCount() is not thread safe, causing ConcurrentModificationExceptions when sensors are read
radai rosenblatt created KAFKA-6345: --- Summary: NetworkClient.inFlightRequestCount() is not thread safe, causing ConcurrentModificationExceptions when sensors are read Key: KAFKA-6345 URL: https://issues.apache.org/jira/browse/KAFKA-6345 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 1.0.0 Reporter: radai rosenblatt example stack trace (code is ~0.10.2.*) {code} java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$ValueIterator.next(HashMap.java:1458) at org.apache.kafka.clients.InFlightRequests.inFlightRequestCount(InFlightRequests.java:109) at org.apache.kafka.clients.NetworkClient.inFlightRequestCount(NetworkClient.java:382) at org.apache.kafka.clients.producer.internals.Sender$SenderMetrics$1.measure(Sender.java:480) at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:61) at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:52) at org.apache.kafka.common.metrics.JmxReporter$KafkaMbean.getAttribute(JmxReporter.java:183) at org.apache.kafka.common.metrics.JmxReporter$KafkaMbean.getAttributes(JmxReporter.java:193) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(DefaultMBeanServerInterceptor.java:709) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(JmxMBeanServer.java:705) {code} looking at latest trunk, the code is still vulnerable: # NetworkClient.inFlightRequestCount() eventually iterates over InFlightRequests.requests.values(), which is backed by a (non-thread-safe) HashMap # this will be called from the "requests-in-flight" sensor's measure() method (Sender.java line ~765 in SenderMetrics ctr), which would be driven by some thread reading JMX values # HashMap in question would also be updated by some client io thread calling NetworkClient.doSend() - which calls into InFlightRequests.add()) i guess the only upside is that this exception will always happen on the thread reading the JMX values and never on the actual client io thread ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6216) kafka logs for misconfigured ssl clients are unhelpful
radai rosenblatt created KAFKA-6216: --- Summary: kafka logs for misconfigured ssl clients are unhelpful Key: KAFKA-6216 URL: https://issues.apache.org/jira/browse/KAFKA-6216 Project: Kafka Issue Type: Bug Affects Versions: 1.0.0 Reporter: radai rosenblatt if you misconfigure the keystores on an ssl client, you will currently get a log full of these: ``` java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) at org.apache.kafka.common.network.Selector.close(Selector.java:531) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131) at java.lang.Thread.run(Thread.java:745) ``` these are caught and printed as part of the client Selector trying to close the channel after having caught an IOException (lets call that the root issue). the root issue itself is only logged at debug, which is not on 99% of the time, leaving users with very litle clues as to whats gone wrong. after turning on debug log, the root issue clearly indicated what the problem was in our case: ``` javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1409) at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214) at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186) at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469) at org.apache.kafka.common.network.SslTransportLayer.handshakeWrap(SslTransportLayer.java:382) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:243) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:69) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131) at java.lang.Thread.run(Thread.java:745) Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1478) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:212) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:957) at sun.security.ssl.Handshaker$1.run(Handshaker.java:897) at sun.security.ssl.Handshaker$1.run(Handshaker.java:894) at java.security.AccessController.doPrivileged(Native Method) at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1347) at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:336) at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:417) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:270) ... 7 more Caused by: sun.security.validator.ValidatorException: No trusted certificate found at sun.security.validator.SimpleValidator.buildTrustedChain(SimpleValidator.java:384) at sun.security.validator.SimpleValidator.engineValidate(SimpleValidator.java:133) at sun.security.validator.Validator.validate(Validator.java:260) at
[GitHub] kafka pull request #4223: when closing a socket in response to an IOExceptio...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/4223 when closing a socket in response to an IOException, also print the root issue if closing fails `Selector.pollSelectionKeys()` attempts to close the channel in response to an Exception (lets call this exception the root issue). if the root issue itself is an IOException, its printed to log at debug level (which is usually off for production users): ```java catch (Exception e) { String desc = channel.socketDescription(); if (e instanceof IOException) log.debug("Connection with {} disconnected", desc, e); <- does not appear in real-life log else if (e instanceof AuthenticationException) // will be logged later as error by clients log.debug("Connection with {} disconnected due to authentication exception", desc, e); else log.warn("Unexpected error from {}; closing connection", desc, e); close(channel, true); } ``` for some cases, close itself would throw an exception. this exception is printed to log as a warning (`Selector.doClose()`): ```java try { channel.close(); } catch (IOException e) { log.error("Exception closing connection to node {}:", channel.id(), e); } ``` this tends to actually show up in user log, looking something like this (note - line numbers are from kafka 10.2.*): ``` java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:470) at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) at org.apache.kafka.common.network.Selector.close(Selector.java:531) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131) at java.lang.Thread.run(Thread.java:745) ``` this issue spams user's logs and is not really helpful in diagnosing the real underlying cause, which (after turning debug logs on) turned out to be (for this particular case): ``` javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1409) at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214) at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186) at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469) at org.apache.kafka.common.network.SslTransportLayer.handshakeWrap(SslTransportLayer.java:382) at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:243) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:69) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:350) at org.apache.kafka.common.network.Selector.poll(Selector.java:303) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:131) at java.lang.Thread.run(Thread.java:745) Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1478) at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:212) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:957) at sun.security.ssl.Handshaker$1.run(Handshaker.java:897) at sun.security.ssl.
[jira] [Created] (KAFKA-5190) builds with low parallelism exhaust system open files and crash
radai rosenblatt created KAFKA-5190: --- Summary: builds with low parallelism exhaust system open files and crash Key: KAFKA-5190 URL: https://issues.apache.org/jira/browse/KAFKA-5190 Project: Kafka Issue Type: Bug Reporter: radai rosenblatt in an attempt to produce more stable builds, i tried reducing the parallelism: ``` export GRADLE_OPTS=-Dorg.gradle.daemon=false ./gradlew clean build -PmaxParallelForks=1 ``` which made things much worse - i now have builds that fail due to hitting the system maximum number of open files (4096 in my case). this happens with or without the gradle daemon -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5189) trunk unstable - DescribeConsumerGroupTest.testDescribeGroupWithNewConsumerWithShortInitializationTimeout fails 90% of times
radai rosenblatt created KAFKA-5189: --- Summary: trunk unstable - DescribeConsumerGroupTest.testDescribeGroupWithNewConsumerWithShortInitializationTimeout fails 90% of times Key: KAFKA-5189 URL: https://issues.apache.org/jira/browse/KAFKA-5189 Project: Kafka Issue Type: Bug Reporter: radai rosenblatt ran the build 10 times on my machine, fails 90% of the time -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] kafka pull request #2025: KAFKA-4293 - improve ByteBufferMessageSet.deepIter...
Github user radai-rosenblatt closed the pull request at: https://github.com/apache/kafka/pull/2025 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #2814: change language level from java 7 to 8
Github user radai-rosenblatt closed the pull request at: https://github.com/apache/kafka/pull/2814 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #2814: change language level from java 7 to 8
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/2814 change language level from java 7 to 8 now that KIP-118 has passed, and there are no major releases coming before 0.11 Signed-off-by: radai-rosenblatt You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka java8-ftw Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2814.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2814 commit 0709f9613e9f8f729cf2cc5cf0927ce8ded70def Author: radai-rosenblatt Date: 2017-04-05T14:17:05Z change language level from java 7 to 8 Signed-off-by: radai-rosenblatt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #2637: throw NoOffsetForPartitionException from poll once...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/2637 throw NoOffsetForPartitionException from poll once for all TopicPartitions affected Signed-off-by: radai-rosenblatt You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka KAFKA-4839 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2637.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2637 commit ee4291644e83fcaf8d3df4de295581562b4c80e4 Author: radai-rosenblatt Date: 2017-03-04T01:47:22Z throw NoOffsetForPartitionException from poll once for all TopicPartitions affected Signed-off-by: radai-rosenblatt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (KAFKA-4839) throw NoOffsetForPartitionException once for all assigned partitions from poll
radai rosenblatt created KAFKA-4839: --- Summary: throw NoOffsetForPartitionException once for all assigned partitions from poll Key: KAFKA-4839 URL: https://issues.apache.org/jira/browse/KAFKA-4839 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 0.10.2.0 Reporter: radai rosenblatt Assignee: radai rosenblatt KafkaConsumer.poll() will currently throw NoOffsetForPartitionException if reset strategy is "none" and there are no defined offsets. problem is the exception will only be thrown for the 1st such partition encountered. since a single consumer can be the owner of thousands of partitions this results in a lot of exception that need to be caught and handled. its possible to throw the exception once for all such TopicPartitions without any user-visible API changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] kafka pull request #2330: KAFKA-4602 - KIP-72 - Allow putting a bound on mem...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/2330 KAFKA-4602 - KIP-72 - Allow putting a bound on memory consumed by Incoming requests this is the initial implementation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka broker-memory-pool-with-muting Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2330.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2330 commit 5697bd5a766b61bc9fe5e20c9a605249dd9b284d Author: radai-rosenblatt Date: 2016-09-27T16:51:30Z KAFKA-4602 - introduce MemoryPool interface, use it to control total outstanding memory dedicated to broker requests Signed-off-by: radai-rosenblatt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (KAFKA-4602) KIP-72 Allow putting a bound on memory consumed by Incoming requests
radai rosenblatt created KAFKA-4602: --- Summary: KIP-72 Allow putting a bound on memory consumed by Incoming requests Key: KAFKA-4602 URL: https://issues.apache.org/jira/browse/KAFKA-4602 Project: Kafka Issue Type: New Feature Components: core Reporter: radai rosenblatt Assignee: radai rosenblatt this issue tracks the implementation of KIP-72, as outlined here - https://cwiki.apache.org/confluence/display/KAFKA/KIP-72%3A+Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4550) current trunk unstable
[ https://issues.apache.org/jira/browse/KAFKA-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4550: Issue Type: Sub-task (was: Bug) Parent: KAFKA-2054 > current trunk unstable > -- > > Key: KAFKA-4550 > URL: https://issues.apache.org/jira/browse/KAFKA-4550 > Project: Kafka > Issue Type: Sub-task >Affects Versions: 0.10.2.0 > Reporter: radai rosenblatt > Attachments: run1.log, run2.log, run3.log, run4.log, run5.log > > > on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9) > when running the exact same build 5 times, I get: > 3 failures (on 3 separate runs): >kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: > No request is complete >org.apache.kafka.streams.integration.QueryableStateIntegrationTest > > queryOnRebalance[1] FAILED java.lang.AssertionError: Condition not met within > timeout 6. Did not receive 1 number of records >kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout > FAILED java.lang.AssertionError: Message set should have 1 message > 1 success > 1 stall (build hung) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4550) current trunk unstable
[ https://issues.apache.org/jira/browse/KAFKA-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4550: Description: on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9) when running the exact same build 5 times, I get: 3 failures (on 3 separate runs): kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: No request is complete org.apache.kafka.streams.integration.QueryableStateIntegrationTest > queryOnRebalance[1] FAILED java.lang.AssertionError: Condition not met within timeout 6. Did not receive 1 number of records kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout FAILED java.lang.AssertionError: Message set should have 1 message 1 success 1 stall (build hung) was: on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9) when running the exact same build 5 times, I get: 3 failures (on 3 separate runs): kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: No request is complete org.apache.kafka.streams.integration.QueryableStateIntegrationTest > queryOnRebalance[1] FAILEDjava.lang.AssertionError: Condition not met within timeout 6. Did not receive 1 number of records kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout FAILED java.lang.AssertionError: Message set should have 1 message 1 success 1 stall > current trunk unstable > -- > > Key: KAFKA-4550 > URL: https://issues.apache.org/jira/browse/KAFKA-4550 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 >Reporter: radai rosenblatt > Attachments: run1.log, run2.log, run3.log, run4.log, run5.log > > > on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9) > when running the exact same build 5 times, I get: > 3 failures (on 3 separate runs): >kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: > No request is complete >org.apache.kafka.streams.integration.QueryableStateIntegrationTest > > queryOnRebalance[1] FAILED java.lang.AssertionError: Condition not met within > timeout 6. Did not receive 1 number of records >kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout > FAILED java.lang.AssertionError: Message set should have 1 message > 1 success > 1 stall (build hung) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4550) current trunk unstable
radai rosenblatt created KAFKA-4550: --- Summary: current trunk unstable Key: KAFKA-4550 URL: https://issues.apache.org/jira/browse/KAFKA-4550 Project: Kafka Issue Type: Bug Affects Versions: 0.10.2.0 Reporter: radai rosenblatt Attachments: run1.log, run2.log, run3.log, run4.log, run5.log on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9) when running the exact same build 5 times, I get: 3 failures (on 3 separate runs): kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: No request is complete org.apache.kafka.streams.integration.QueryableStateIntegrationTest > queryOnRebalance[1] FAILEDjava.lang.AssertionError: Condition not met within timeout 6. Did not receive 1 number of records kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout FAILED java.lang.AssertionError: Message set should have 1 message 1 success 1 stall -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4550) current trunk unstable
[ https://issues.apache.org/jira/browse/KAFKA-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4550: Attachment: run5.log run4.log run3.log run2.log run1.log > current trunk unstable > -- > > Key: KAFKA-4550 > URL: https://issues.apache.org/jira/browse/KAFKA-4550 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0 > Reporter: radai rosenblatt > Attachments: run1.log, run2.log, run3.log, run4.log, run5.log > > > on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9) > when running the exact same build 5 times, I get: > 3 failures (on 3 separate runs): >kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: > No request is complete >org.apache.kafka.streams.integration.QueryableStateIntegrationTest > > queryOnRebalance[1] FAILEDjava.lang.AssertionError: Condition not met within > timeout 6. Did not receive 1 number of records >kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout > FAILED java.lang.AssertionError: Message set should have 1 message > 1 success > 1 stall -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4250) make ProducerRecord and ConsumerRecord extensible
[ https://issues.apache.org/jira/browse/KAFKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4250: Fix Version/s: (was: 0.10.1.1) > make ProducerRecord and ConsumerRecord extensible > - > > Key: KAFKA-4250 > URL: https://issues.apache.org/jira/browse/KAFKA-4250 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: radai rosenblatt > Assignee: radai rosenblatt > Fix For: 0.10.2.0 > > > KafkaProducer and KafkaConsumer implement interfaces are are designed to be > extensible (or at least allow it). > ProducerRecord and ConsumerRecord, however, are final, making extending > producer/consumer very difficult. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4250) make ProducerRecord and ConsumerRecord extensible
[ https://issues.apache.org/jira/browse/KAFKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4250: Fix Version/s: (was: 0.10.2.0) 0.10.1.1 > make ProducerRecord and ConsumerRecord extensible > - > > Key: KAFKA-4250 > URL: https://issues.apache.org/jira/browse/KAFKA-4250 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: radai rosenblatt > Assignee: radai rosenblatt > Fix For: 0.10.1.1 > > > KafkaProducer and KafkaConsumer implement interfaces are are designed to be > extensible (or at least allow it). > ProducerRecord and ConsumerRecord, however, are final, making extending > producer/consumer very difficult. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-4011) allow sizing RequestQueue in bytes
[ https://issues.apache.org/jira/browse/KAFKA-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt resolved KAFKA-4011. - Resolution: Won't Fix Fix Version/s: (was: 0.10.2.0) this was the original KIP-72 implementation. its no longer relevant. > allow sizing RequestQueue in bytes > -- > > Key: KAFKA-4011 > URL: https://issues.apache.org/jira/browse/KAFKA-4011 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.0 > Reporter: radai rosenblatt > > currently RequestChannel's requestQueue is sized in number of requests: > {code:title=RequestChannel.scala|borderStyle=solid} > private val requestQueue = new > ArrayBlockingQueue[RequestChannel.Request](queueSize) > {code} > under the assumption that the end goal is a bound on server memory > consumption, this requires the admin to know the avg request size. > I would like to propose sizing the requestQueue not by number of requests, > but by their accumulated size (Request.buffer.capacity). this would probably > make configuring and sizing an instance easier. > there would need to be a new configuration setting for this > (queued.max.bytes?) - which could be either in addition to or instead of the > current queued.max.requests setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] kafka pull request #1714: KAFKA-4011 - fix issues and beef up tests around B...
Github user radai-rosenblatt closed the pull request at: https://github.com/apache/kafka/pull/1714 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #2025: KAFKA-4293 - improve ByteBufferMessageSet.deepIter...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/2025 KAFKA-4293 - improve ByteBufferMessageSet.deepIterator() performance by relying on underlying stream's available() implementation also: provided better available() for ByteBufferInputStream provided better available() for KafkaLZ4BlockInputStream added KafkaGZIPInputStream with a better available() fixed KafkaLZ4BlockOutputStream.close() to properly flush Signed-off-by: radai-rosenblatt You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka suchwow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2025.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2025 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (KAFKA-4293) ByteBufferMessageSet.deepIterator burns CPU catching EOFExceptions
radai rosenblatt created KAFKA-4293: --- Summary: ByteBufferMessageSet.deepIterator burns CPU catching EOFExceptions Key: KAFKA-4293 URL: https://issues.apache.org/jira/browse/KAFKA-4293 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.10.0.1 Reporter: radai rosenblatt around line 110: {noformat} try { while (true) innerMessageAndOffsets.add(readMessageFromStream(compressed)) } catch { case eofe: EOFException => // we don't do anything at all here, because the finally // will close the compressed input stream, and we simply // want to return the innerMessageAndOffsets {noformat} the only indication the code has that the end of the oteration was reached is by catching EOFException (which will be thrown inside readMessageFromStream()). profiling runs performed at linkedIn show 10% of the total broker CPU time taken up by Throwable.fillInStack() because of this behaviour. unfortunately InputStream.available() cannot be relied upon (concrete example - GZipInputStream will not correctly return 0) so the fix would probably be a wire format change to also encode the number of messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4289) CPU wasted on reflection calls initializing short-lived loggers
radai rosenblatt created KAFKA-4289: --- Summary: CPU wasted on reflection calls initializing short-lived loggers Key: KAFKA-4289 URL: https://issues.apache.org/jira/browse/KAFKA-4289 Project: Kafka Issue Type: Bug Affects Versions: 0.10.0.1 Reporter: radai rosenblatt an internal profiling run at linkedin found ~5% of the CPU time consumed by `sun.reflect.Reflection.getCallerClass()`. digging into the stack trace shows its from initializing short lived logger objects in `FileMessageSet` and `RequestChannel.Request` -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] kafka pull request #2006: KAFKA-4289 - moved short-lived loggers to companio...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/2006 KAFKA-4289 - moved short-lived loggers to companion objects Signed-off-by: radai-rosenblatt You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka omgsrsly Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2006.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2006 commit 7b475e6eaffab891b8737c00df71d951cc52c81b Author: radai-rosenblatt Date: 2016-10-11T00:21:07Z KAFKA-4289 - moved short-lived loggers to companion objects Signed-off-by: radai-rosenblatt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (KAFKA-4250) make ProducerRecord and ConsumerRecord extensible
[ https://issues.apache.org/jira/browse/KAFKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546226#comment-15546226 ] radai rosenblatt commented on KAFKA-4250: - PR up - https://github.com/apache/kafka/pull/1961 > make ProducerRecord and ConsumerRecord extensible > - > > Key: KAFKA-4250 > URL: https://issues.apache.org/jira/browse/KAFKA-4250 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: radai rosenblatt > > KafkaProducer and KafkaConsumer implement interfaces are are designed to be > extensible (or at least allow it). > ProducerRecord and ConsumerRecord, however, are final, making extending > producer/consumer very difficult. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4250) make ProducerRecord and ConsumerRecord extensible
radai rosenblatt created KAFKA-4250: --- Summary: make ProducerRecord and ConsumerRecord extensible Key: KAFKA-4250 URL: https://issues.apache.org/jira/browse/KAFKA-4250 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.10.0.1 Reporter: radai rosenblatt KafkaProducer and KafkaConsumer implement interfaces are are designed to be extensible (or at least allow it). ProducerRecord and ConsumerRecord, however, are final, making extending producer/consumer very difficult. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] kafka pull request #1961: make ProducerRecord and ConsumerRecord extensible
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/1961 make ProducerRecord and ConsumerRecord extensible Signed-off-by: radai-rosenblatt You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka extensibility Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1961 commit 21233b0c656f76e6d52a37d57b1b4cb10b94a270 Author: radai-rosenblatt Date: 2016-10-04T17:34:29Z make ProducerRecord and ConsumerRecord extensible Signed-off-by: radai-rosenblatt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #1930: KAFKA-4228 - make producer close on sender thread ...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/1930 KAFKA-4228 - make producer close on sender thread death, make consumer shutdown on failure to rebalance, and make MM die on any of the above. the JIRA issue (https://issues.apache.org/jira/browse/KAFKA-4228) details a cascade of failures that resulted in an entire mirror maker cluster stalling due to an OOM death on one mm instance. this patch makes producers and consumers close themselves on the errors encountered, and mm to shut down if anything happens to producers or consumers. Signed-off-by: radai-rosenblatt You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka honorable-death Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1930.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1930 commit efca6e8dfa65acb5fbb1bc1801fda151b08f4f81 Author: radai-rosenblatt Date: 2016-09-29T00:16:16Z KAFKA-4228 - make producer close on sender thread death, make consumer shutdown on failure to rebalance, and make MM die on any of the above. Signed-off-by: radai-rosenblatt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (KAFKA-4228) Sender thread death leaves KafkaProducer in a bad state
[ https://issues.apache.org/jira/browse/KAFKA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4228: Description: a KafkaProducer's Sender thread may die: {noformat} 2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in kafka-producer-network-thread | mm_ei-lca1_uni java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[?:1.8.0_40] at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40] at org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.common.requests.RequestSend.(RequestSend.java:29) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) ~[kafka-clients-0.9.0.666.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40] {noformat} which leaves the producer in a bad state. in this state, a call to flush(), for example, will hang indefinitely as the sender thread is not around to flush batches but theyve not been aborted. even worse, this can happen in MirrorMaker just before a rebalance, at which point MM will just block indefinitely during a rebalance (in beforeReleasingPartitions()). a rebalance participant hung in such a way will cause rebalance to fail for the rest of the participants, at which point ZKRebalancerListener.watcherExecutorThread() dies to an exception (cannot rebalance after X attempts) but the consumer that ran the thread will remain live. the end result is a bunch of zombie mirror makers and orphan topic partitions. a dead sender thread should result in closing the producer. a consumer failing to rebalance should shut down. any issue with the producer or consumer should cause mirror-maker death. was: a KafkaProducer's Sender thread may die: {noformat} 2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in kafka-producer-network-thread | mm_ei-lca1_uni java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[?:1.8.0_40] at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40] at org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.common.requests.RequestSend.(RequestSend.java:29) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) ~[kafka-clients-0.9.0.666.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40] {noformat} which leaves the producer in a bad state. in this state, a call to flush(), for example, will hang indefinitely as the sender thread is not around to flush batches but theyve not been aborted. even worse, this can happen in MirrorMaker just before a rebalance, at which point MM will just block indefinitely during a rebalance and the end result is unowned topic partitions. a dead sender thread should result in closing the producer, and a closed producer should result in MirrorMaker death. > Sender thread death leaves KafkaProducer in a bad state > --- > > Key: KAFKA-4228 > URL: https://issues.apache.org/jira/browse/KAFKA-4228 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.10.0.1 >Reporter: radai rosenblatt > > a KafkaProducer's Sender thread may die: > {noformat} > 2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | > mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in > kafka-producer-network-thread | mm_ei-lca1_uni > java.lang.OutOfMemoryError: Java heap space >at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[?:1.8.0_40] >at jav
[jira] [Created] (KAFKA-4228) Sender thread death leaves KafkaProducer in a bad state
radai rosenblatt created KAFKA-4228: --- Summary: Sender thread death leaves KafkaProducer in a bad state Key: KAFKA-4228 URL: https://issues.apache.org/jira/browse/KAFKA-4228 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.10.0.1 Reporter: radai rosenblatt a KafkaProducer's Sender thread may die: {noformat} 2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in kafka-producer-network-thread | mm_ei-lca1_uni java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[?:1.8.0_40] at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40] at org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.common.requests.RequestSend.(RequestSend.java:29) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) ~[kafka-clients-0.9.0.666.jar:?] at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) ~[kafka-clients-0.9.0.666.jar:?] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40] {noformat} which leaves the producer in a bad state. in this state, a call to flush(), for example, will hang indefinitely as the sender thread is not around to flush batches but theyve not been aborted. even worse, this can happen in MirrorMaker just before a rebalance, at which point MM will just block indefinitely during a rebalance and the end result is unowned topic partitions. a dead sender thread should result in closing the producer, and a closed producer should result in MirrorMaker death. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] kafka pull request #1813: make mirror maker threads daemons and make sure an...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/1813 make mirror maker threads daemons and make sure any uncaught exceptions are logged You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka mm-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1813 commit 211928bdd5d90e16d375fd965cca592795a44d30 Author: radai-rosenblatt Date: 2016-09-02T00:33:49Z make mirror maker threads daemons and make sure any uncaught exceptions are logged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #1714: KAFKA-4011 - fix issues and beef up tests around B...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/1714 KAFKA-4011 - fix issues and beef up tests around ByteBoundedBlockingQueue as discussed under KIP-72 You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka ByteBoundedBlockingQueue-work Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1714.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1714 commit d5a63e5fe818587a582fce77bd622422d404afc2 Author: radai-rosenblatt Date: 2016-08-08T21:59:24Z KAFKA-4011 - fix issues and beef up tests around ByteBoundedBlockingQueue, as discussed under KIP-72 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #1710: KAFKA-4025 - make sure file.encoding system proper...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/1710 KAFKA-4025 - make sure file.encoding system property is set to UTF-8 when calling rat You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka fix-build-on-windows Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1710.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1710 commit 35563f203e993bc044bc408f50cad35dea2203bb Author: radai-rosenblatt Date: 2016-08-07T02:26:15Z KAFKA-4025 - make sure file.encoding system property is set to UTF-8 when invoking the rat ant task --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #1708: KAFKA-4025 - make sure file.encoding system proper...
Github user radai-rosenblatt closed the pull request at: https://github.com/apache/kafka/pull/1708 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] kafka pull request #1708: KAFKA-4025 - make sure file.encoding system proper...
GitHub user radai-rosenblatt opened a pull request: https://github.com/apache/kafka/pull/1708 KAFKA-4025 - make sure file.encoding system property is set to UTF-8 when invoking the rat task reset back to previous value after. You can merge this pull request into a Git repository by running: $ git pull https://github.com/radai-rosenblatt/kafka fix-build-on-windows Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1708 commit cc249d06d98fd59ac42826aa2a894571a0967c34 Author: radai-rosenblatt Date: 2016-08-07T02:21:58Z ÂKAFKA-4025 - make sure file.encoding system property is set to UTF-8 when invoking the rat task. reset back to previous value after --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (KAFKA-4025) build fails on windows due to rat target output encoding
[ https://issues.apache.org/jira/browse/KAFKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4025: Attachment: windows build debug output.txt debug output for running the rat task on windows > build fails on windows due to rat target output encoding > > > Key: KAFKA-4025 > URL: https://issues.apache.org/jira/browse/KAFKA-4025 > Project: Kafka > Issue Type: Bug > Environment: windows 7, wither regular command prompt or git bash > Reporter: radai rosenblatt >Priority: Minor > Attachments: windows build debug output.txt > > > kafka runs a rat report during the build, using [the rat ant report > task|http://creadur.apache.org/rat/apache-rat-tasks/report.html], which has > no output encoding parameter. > this means that the resulting xml report is produced using the system-default > encoding, which is OS-dependent: > the rat ant task code instantiates the output writer like so > ([org.apache.rat.anttasks.Report.java|http://svn.apache.org/repos/asf/creadur/rat/tags/apache-rat-project-0.11/apache-rat-tasks/src/main/java/org/apache/rat/anttasks/Report.java] > line 196): > {noformat} > out = new PrintWriter(new FileWriter(reportFile));{noformat} > which eventually leads to {{Charset.defaultCharset()}} that relies on the > file.encoding system property. this causes an issue if the default encoding > isnt UTF-8 (which it isnt on windows) as the code called by > printUnknownFiles() in rat.gradle defaults to UTF-8 when reading the report > xml, causing the build to fail with: > {noformat} > com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: > Invalid byte 1 of 1-byte UTF-8 sequence.{noformat} > (see complete output of {{gradlew --debug --stacktrace rat}} in attached file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-4025) build fails on windows due to rat target output encoding
[ https://issues.apache.org/jira/browse/KAFKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] radai rosenblatt updated KAFKA-4025: Environment: windows 7, either regular command prompt or git bash (was: windows 7, wither regular command prompt or git bash) > build fails on windows due to rat target output encoding > > > Key: KAFKA-4025 > URL: https://issues.apache.org/jira/browse/KAFKA-4025 > Project: Kafka > Issue Type: Bug > Environment: windows 7, either regular command prompt or git bash > Reporter: radai rosenblatt >Priority: Minor > Attachments: windows build debug output.txt > > > kafka runs a rat report during the build, using [the rat ant report > task|http://creadur.apache.org/rat/apache-rat-tasks/report.html], which has > no output encoding parameter. > this means that the resulting xml report is produced using the system-default > encoding, which is OS-dependent: > the rat ant task code instantiates the output writer like so > ([org.apache.rat.anttasks.Report.java|http://svn.apache.org/repos/asf/creadur/rat/tags/apache-rat-project-0.11/apache-rat-tasks/src/main/java/org/apache/rat/anttasks/Report.java] > line 196): > {noformat} > out = new PrintWriter(new FileWriter(reportFile));{noformat} > which eventually leads to {{Charset.defaultCharset()}} that relies on the > file.encoding system property. this causes an issue if the default encoding > isnt UTF-8 (which it isnt on windows) as the code called by > printUnknownFiles() in rat.gradle defaults to UTF-8 when reading the report > xml, causing the build to fail with: > {noformat} > com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: > Invalid byte 1 of 1-byte UTF-8 sequence.{noformat} > (see complete output of {{gradlew --debug --stacktrace rat}} in attached file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4025) build fails on windows due to rat target output encoding
radai rosenblatt created KAFKA-4025: --- Summary: build fails on windows due to rat target output encoding Key: KAFKA-4025 URL: https://issues.apache.org/jira/browse/KAFKA-4025 Project: Kafka Issue Type: Bug Environment: windows 7, wither regular command prompt or git bash Reporter: radai rosenblatt Priority: Minor kafka runs a rat report during the build, using [the rat ant report task|http://creadur.apache.org/rat/apache-rat-tasks/report.html], which has no output encoding parameter. this means that the resulting xml report is produced using the system-default encoding, which is OS-dependent: the rat ant task code instantiates the output writer like so ([org.apache.rat.anttasks.Report.java|http://svn.apache.org/repos/asf/creadur/rat/tags/apache-rat-project-0.11/apache-rat-tasks/src/main/java/org/apache/rat/anttasks/Report.java] line 196): {noformat} out = new PrintWriter(new FileWriter(reportFile));{noformat} which eventually leads to {{Charset.defaultCharset()}} that relies on the file.encoding system property. this causes an issue if the default encoding isnt UTF-8 (which it isnt on windows) as the code called by printUnknownFiles() in rat.gradle defaults to UTF-8 when reading the report xml, causing the build to fail with: {noformat} com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.{noformat} (see complete output of {{gradlew --debug --stacktrace rat}} in attached file) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-4011) allow sizing RequestQueue in bytes
radai rosenblatt created KAFKA-4011: --- Summary: allow sizing RequestQueue in bytes Key: KAFKA-4011 URL: https://issues.apache.org/jira/browse/KAFKA-4011 Project: Kafka Issue Type: Improvement Components: core Affects Versions: 0.10.0.0 Reporter: radai rosenblatt Fix For: 0.10.1.0 currently RequestChannel's requestQueue is sized in number of requests: {code:title=RequestChannel.scala|borderStyle=solid} private val requestQueue = new ArrayBlockingQueue[RequestChannel.Request](queueSize) {code} under the assumption that the end goal is a bound on server memory consumption, this requires the admin to know the avg request size. I would like to propose sizing the requestQueue not by number of requests, but by their accumulated size (Request.buffer.capacity). this would probably make configuring and sizing an instance easier. there would need to be a new configuration setting for this (queued.max.bytes?) - which could be either in addition to or instead of the current queued.max.requests setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)