[jira] [Commented] (KAFKA-4128) Kafka broker losses messages when zookeeper session times out

2016-10-27 Thread Mazhar Shaikh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610937#comment-15610937
 ] 

Mazhar Shaikh commented on KAFKA-4128:
--

[2016-10-26 18:36:55,992] INFO KafkaConfig values:
request.timeout.ms = 3
log.roll.hours = 168
inter.broker.protocol.version = 0.9.0.X
log.preallocate = false
security.inter.broker.protocol = PLAINTEXT
controller.socket.timeout.ms = 3
broker.id.generation.enable = true
ssl.keymanager.algorithm = SunX509
ssl.key.password = null
log.cleaner.enable = false
ssl.provider = null
num.recovery.threads.per.data.dir = 2
background.threads = 10
unclean.leader.election.enable = true
sasl.kerberos.kinit.cmd = /usr/bin/kinit
replica.lag.time.max.ms = 16000
ssl.endpoint.identification.algorithm = null
auto.create.topics.enable = false
zookeeper.sync.time.ms = 2000
ssl.client.auth = none
ssl.keystore.password = null
log.cleaner.io.buffer.load.factor = 0.9
offsets.topic.compression.codec = 0
log.retention.hours = 168
log.dirs = /data/kafka/broker-b1
ssl.protocol = TLS
log.index.size.max.bytes = 10485760
sasl.kerberos.min.time.before.relogin = 6
log.retention.minutes = null
connections.max.idle.ms = 60
ssl.trustmanager.algorithm = PKIX
offsets.retention.minutes = 1440
max.connections.per.ip = 2147483647
replica.fetch.wait.max.ms = 500
metrics.num.samples = 2
port = 9092
offsets.retention.check.interval.ms = 60
log.cleaner.dedupe.buffer.size = 134217728
log.segment.bytes = 1073741824
group.min.session.timeout.ms = 6000
producer.purgatory.purge.interval.requests = 1000
min.insync.replicas = 1
ssl.truststore.password = null
log.flush.scheduler.interval.ms = 2000
socket.receive.buffer.bytes = 16777216
leader.imbalance.per.broker.percentage = 10
num.io.threads = 32
zookeeper.connect = 
b1.broker.com:2181,b2.broker.com:2181,zoo3.broker.com:2182
queued.max.requests = 500
offsets.topic.replication.factor = 3
replica.socket.timeout.ms = 3
offsets.topic.segment.bytes = 104857600
replica.high.watermark.checkpoint.interval.ms = 5000
broker.id = 0
ssl.keystore.location = null
listeners = PLAINTEXT://0.0.0.0:9092
log.flush.interval.messages = 2
principal.builder.class = class 
org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
log.retention.ms = null
offsets.commit.required.acks = -1
sasl.kerberos.principal.to.local.rules = [DEFAULT]
group.max.session.timeout.ms = 3
num.replica.fetchers = 16
advertised.listeners = null
replica.socket.receive.buffer.bytes = 16777216
delete.topic.enable = true
log.index.interval.bytes = 4096
metric.reporters = []
compression.type = producer
log.cleanup.policy = delete
controlled.shutdown.max.retries = 1
log.cleaner.threads = 1
quota.window.size.seconds = 1
zookeeper.connection.timeout.ms = 6000
offsets.load.buffer.size = 5242880
zookeeper.session.timeout.ms = 3
ssl.cipher.suites = null
authorizer.class.name =
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.service.name = null
controlled.shutdown.enable = true
offsets.topic.num.partitions = 50
quota.window.num = 11
message.max.bytes = 112
log.cleaner.backoff.ms = 15000
log.roll.jitter.hours = 0
log.retention.check.interval.ms = 3
replica.fetch.max.bytes = 1048576
log.cleaner.delete.retention.ms = 8640
fetch.purgatory.purge.interval.requests = 1000
log.cleaner.min.cleanable.ratio = 0.5
offsets.commit.timeout.ms = 5000
zookeeper.set.acl = false
log.retention.bytes = 4294967296
offset.metadata.max.bytes = 4096
leader.imbalance.check.interval.seconds = 300
quota.consumer.default = 9223372036854775807
log.roll.jitter.ms = null
reserved.broker.max.id = 1000
replica.fetch.backoff.ms = 1000
advertised.host.name = b1.broker.com
quota.producer.default = 9223372036854775807
log.cleaner.io.buffer.size = 524288
controlled.shutdown.retry.backoff.ms = 2000
log.dir = /tmp/kafka-logs
log.flush.offset.checkpoint.interval.ms = 6
log.segment.delete.delay.ms = 6
num.partitions = 96
num.network.threads = 16
socket.request.max.bytes = 104857600
sasl.kerberos.ticket.renew.window.factor = 0.8

[jira] [Commented] (KAFKA-4128) Kafka broker losses messages when zookeeper session times out

2016-10-12 Thread Mazhar Shaikh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568706#comment-15568706
 ] 

Mazhar Shaikh commented on KAFKA-4128:
--

Hi Gwen Shapira,

My concern for this bug is as below :

1. When ever a follower connects to leader, where follower has more messages 
(offset) then leader, then follower truncates/Drop these msg to last 
Highwatermark.

   =>Here, Do we have any configuration which will avoid this dropping of msg 
and instead replicate it to master ?
 
2. What can be the possible reason for ZookeeperSession timeout, considering 
there is no issues with garbage collection.


Broker = 6
replica = 2
Total Partitions : 96, 
Partition per broker : 16 (Leader) + 16 (Follower)





> Kafka broker losses messages when zookeeper session times out
> -
>
> Key: KAFKA-4128
> URL: https://issues.apache.org/jira/browse/KAFKA-4128
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1, 0.9.0.1
>Reporter: Mazhar Shaikh
>Priority: Critical
>
> Pumping 30k msgs/second after some 6-8 hrs of run below logs are printed and 
> the messages are lost.
> [More than 5k messages are lost on every partitions]
> Below are few logs:
> [2016-09-06 05:00:42,595] INFO Client session timed out, have not heard from 
> server in 20903ms for sessionid 0x256fabec47c0003, closing socket connection 
> and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:42,696] INFO zookeeper state changed (Disconnected) 
> (org.I0Itec.zkclient.ZkClient)
> [2016-09-06 05:00:42,753] INFO Partition [topic,62] on broker 4: Shrinking 
> ISR for partition [topic,62] from 4,2 to 4 (kafka.cluster.Partition)
> [2016-09-06 05:00:43,585] INFO Opening socket connection to server 
> b0/169.254.2.1:2182. Will not attempt to authenticate using SASL (unknown 
> error) (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:43,586] INFO Socket connection established to 
> b0/169.254.2.1:2182, initiating session (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:43,587] INFO Unable to read additional data from server 
> sessionid 0x256fabec47c0003, likely server has closed socket, closing socket 
> connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,644] INFO Opening socket connection to server 
> b1/169.254.2.116:2181. Will not attempt to authenticate using SASL (unknown 
> error) (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,651] INFO Socket connection established to 
> b1/169.254.2.116:2181, initiating session (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,658] INFO zookeeper state changed (Expired) 
> (org.I0Itec.zkclient.ZkClient)
> [2016-09-06 05:00:44,659] INFO Initiating client connection, 
> connectString=b2.broker.com:2181,b1.broker.com:2181,zoo3.broker.com:2182 
> sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@37b8e86a 
> (org.apache.zookeeper.ZooKeeper)
> [2016-09-06 05:00:44,659] INFO Unable to reconnect to ZooKeeper service, 
> session 0x256fabec47c0003 has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,661] INFO EventThread shut down 
> (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,662] INFO Opening socket connection to server 
> b2/169.254.2.216:2181. Will not attempt to authenticate using SASL (unknown 
> error) (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,662] INFO Socket connection established to 
> b2/169.254.2.216:2181, initiating session (org.apache.zookeeper.ClientCnxn)
> [2016-09-06 05:00:44,665] ERROR Error handling event ZkEvent[New session 
> event sent to 
> kafka.controller.KafkaController$SessionExpirationListener@33b7dedc] 
> (org.I0Itec.zkclient.ZkEventThread)
> java.lang.IllegalStateException: Kafka scheduler has not been started
> at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
> at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
> at 
> kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
> at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1108)
> at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1107)
> at 
> kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1107)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at 
> kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1107)
> at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>