[jira] [Created] (KAFKA-6762) log-cleaner thread terminates due to java.lang.IllegalStateException

2018-04-07 Thread Ricardo Bartolome (JIRA)
Ricardo Bartolome created KAFKA-6762:


 Summary: log-cleaner thread terminates due to 
java.lang.IllegalStateException
 Key: KAFKA-6762
 URL: https://issues.apache.org/jira/browse/KAFKA-6762
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 1.0.0
 Environment: os: GNU/Linux 
arch: x86_64 
Kernel: 4.9.77 
jvm: OpenJDK 1.8.0
Reporter: Ricardo Bartolome


We are experiencing some problems with kafka log-cleaner thread on Kafka 1.0.0. 
We have planned to update this cluster to 1.1.0 by next week in order to fix 
KAFKA-6683, but until then we can only confirm that it happens in 1.0.0.

log-cleaner thread crashes after a while with the following error:
{code:java}
[2018-03-28 11:14:40,199] INFO Cleaner 0: Beginning cleaning of log 
__consumer_offsets-31. (kafka.log.LogCleaner)
[2018-03-28 11:14:40,199] INFO Cleaner 0: Building offset map for 
__consumer_offsets-31... (kafka.log.LogCleaner)
[2018-03-28 11:14:40,218] INFO Cleaner 0: Building offset map for log 
__consumer_offsets-31 for 16 segments in offset range [1612869, 14282934). 
(kafka.log.LogCleaner)
[2018-03-28 11:14:58,566] INFO Cleaner 0: Offset map for log 
__consumer_offsets-31 complete. (kafka.log.LogCleaner)
[2018-03-28 11:14:58,566] INFO Cleaner 0: Cleaning log __consumer_offsets-31 
(cleaning prior to Tue Mar 27 09:25:09 GMT 2018, discarding tombstones prior to 
Sat Feb 24 11:04:21 GMT 2018
)... (kafka.log.LogCleaner)
[2018-03-28 11:14:58,567] INFO Cleaner 0: Cleaning segment 0 in log 
__consumer_offsets-31 (largest timestamp Fri Feb 23 11:40:54 GMT 2018) into 0, 
discarding deletes. (kafka.log.LogClea
ner)
[2018-03-28 11:14:58,570] INFO Cleaner 0: Growing cleaner I/O buffers from 
262144bytes to 524288 bytes. (kafka.log.LogCleaner)
[2018-03-28 11:14:58,576] INFO Cleaner 0: Growing cleaner I/O buffers from 
524288bytes to 112 bytes. (kafka.log.LogCleaner)
[2018-03-28 11:14:58,593] ERROR [kafka-log-cleaner-thread-0]: Error due to 
(kafka.log.LogCleaner)
java.lang.IllegalStateException: This log contains a message larger than 
maximum allowable size of 112.
at kafka.log.Cleaner.growBuffers(LogCleaner.scala:622)
at kafka.log.Cleaner.cleanInto(LogCleaner.scala:574)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:459)
at kafka.log.Cleaner.$anonfun$doClean$6(LogCleaner.scala:396)
at kafka.log.Cleaner.$anonfun$doClean$6$adapted(LogCleaner.scala:395)
at scala.collection.immutable.List.foreach(List.scala:389)
at kafka.log.Cleaner.doClean(LogCleaner.scala:395)
at kafka.log.Cleaner.clean(LogCleaner.scala:372)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:263)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:243)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
[2018-03-28 11:14:58,601] INFO [kafka-log-cleaner-thread-0]: Stopped 
(kafka.log.LogCleaner)
[2018-04-04 14:25:12,773] INFO The cleaning for partition 
__broker-11-health-check-0 is aborted and paused (kafka.log.LogCleaner)
[2018-04-04 14:25:12,773] INFO Compaction for partition 
__broker-11-health-check-0 is resumed (kafka.log.LogCleaner)
[2018-04-04 14:25:12,774] INFO The cleaning for partition 
__broker-11-health-check-0 is aborted (kafka.log.LogCleaner)
[2018-04-04 14:25:22,850] INFO Shutting down the log cleaner. 
(kafka.log.LogCleaner)
[2018-04-04 14:25:22,850] INFO [kafka-log-cleaner-thread-0]: Shutting down 
(kafka.log.LogCleaner)
[2018-04-04 14:25:22,850] INFO [kafka-log-cleaner-thread-0]: Shutdown completed 
(kafka.log.LogCleaner)
{code}
What we know so far is:
 * We are unable to reproduce it yet in a consistent manner.
 * It only happens in the PRO cluster and not in the PRE cluster for the same 
customer (which message payloads are very similar)
 * Checking our Kafka logs, it only happened on the internal topics 
*__consumer_offsets-**
 * When we restart the broker process the log-cleaner starts working again but 
it can take between 3 minutes and some hours to die again.
 * We workaround it by temporary increasing the message.max.bytes and 
replica.fetch.max.bytes values to 10485760 (10MB) from default 112 (~1MB).
** Before message.max.bytes = 10MB, we tried to match message.max.size with the 
value of replica.fetch.max.size (1048576), but log-cleaned died with the same 
error but different limit.
 ** This allowed the log-cleaner not to die and compact enough data as for disk 
space to go from ~600GB to ~100GB.
 ** Without this limit change, the log-cleaner dies after a while and the used 
disk space stay at ~450GB and starts growing again due to cluster activity.

Our server.properties content is as follows, as printed ins server.log at 
broker startup.
{code:java}
broker.id=11
delete.topic.enable=true
advertised.listeners=PLAINTEXT://broker-ip:9092
num.network.threads=3
num.io.

[jira] [Created] (KAFKA-6358) Per topic producer/fetch_consumer/fetch_follower metrics

2017-12-13 Thread Ricardo Bartolome (JIRA)
Ricardo Bartolome created KAFKA-6358:


 Summary: Per topic producer/fetch_consumer/fetch_follower metrics
 Key: KAFKA-6358
 URL: https://issues.apache.org/jira/browse/KAFKA-6358
 Project: Kafka
  Issue Type: Wish
  Components: metrics
Affects Versions: 1.0.0
Reporter: Ricardo Bartolome
Priority: Minor


We are using the following JMX beans to monitor Kafka 1.0.0:

{code}
kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce
Mean
50thPercentile
...
99thPercentile

kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer
Count

kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower
Count
{code}

There are more, but this provide an idea of what we are using in order to get 
produce/fetch operations on a per-broker basis. Nevertheless, in order to 
identify abusing consumers/clients in our kafka cluster, we would appreciate to 
have these metrics in a per-topic basis.

As example of per-topic metrics we have:
{code}
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=.*
{code}

Where we have a per-topic bean with a "Count" attribute that we can query. That 
way we can know which topics are ingesting more data and which ones less data. 
We can't do that with the metric explained above.

Would you consider a change in an upcoming Kafka version as a feature request?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)