[jira] [Commented] (KAFKA-6260) AbstractCoordinator not clearly handles NULL Exception

2017-11-28 Thread Seweryn Habdank-Wojewodzki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270342#comment-16270342
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-6260:
---

BTW the other bug: KAFKA-5882 happens also in the same environment.

> AbstractCoordinator not clearly handles NULL Exception
> --
>
> Key: KAFKA-6260
> URL: https://issues.apache.org/jira/browse/KAFKA-6260
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
> Environment: RedHat Linux
>Reporter: Seweryn Habdank-Wojewodzki
>
> The error reporting is not clear. But it seems that Kafka Heartbeat shuts 
> down application due to NULL exception caused by "fake" disconnections.
> One more comment. We are processing messages in the stream, but sometimes we 
> have to block processing for minutes, as consumers are not handling too much 
> load. Is it possibble that when stream is waiting, then heartbeat is as well 
> blocked?
> Can you check that?
> {code}
> 2017-11-23 23:54:47 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending Heartbeat request to coordinator 
> cljp01.eb.lan.at:9093 (id: 2147483646 rack: null)
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Sending HEARTBEAT 
> {group_id=kafka-endpoint,generation_id=3834,member_id=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer-94f18be5-e49a-4817-9e5a-fe82a64e0b08}
>  with correlation id 24 to node 2147483646
> 2017-11-23 23:54:50 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Completed receive from node 2147483646 for HEARTBEAT 
> with correlation id 24, received {throttle_time_ms=0,error_code=0}
> 2017-11-23 23:54:50 DEBUG AbstractCoordinator:177 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Received successful Heartbeat response
> 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Disconnecting from node 1 due to request timeout.
> 2017-11-23 23:54:52 TRACE NetworkClient:135 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled request 
> {replica_id=-1,max_wait_time=6000,min_bytes=1,max_bytes=52428800,isolation_level=0,topics=[{topic=clj_internal_topic,partitions=[{partition=6,fetch_offset=211558395,log_start_offset=-1,max_bytes=1048576},{partition=8,fetch_offset=210178209,log_start_offset=-1,max_bytes=1048576},{partition=0,fetch_offset=209353523,log_start_offset=-1,max_bytes=1048576},{partition=2,fetch_offset=209291462,log_start_offset=-1,max_bytes=1048576},{partition=4,fetch_offset=210728595,log_start_offset=-1,max_bytes=1048576}]}]}
>  with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG ConsumerNetworkClient:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Cancelled FETCH request RequestHeader(apiKey=FETCH, 
> apiVersion=6, 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  correlationId=21) with correlation id 21 due to node 1 being disconnected
> 2017-11-23 23:54:52 DEBUG Fetcher:195 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Fetch request 
> {clj_internal_topic-6=(offset=211558395, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-8=(offset=210178209, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-0=(offset=209353523, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-2=(offset=209291462, logStartOffset=-1, 
> maxBytes=1048576), clj_internal_topic-4=(offset=210728595, logStartOffset=-1, 
> maxBytes=1048576)} to cljp01.eb.lan.at:9093 (id: 1 rack: DC-1) failed 
> org.apache.kafka.common.errors.DisconnectException: null
> 2017-11-23 23:54:52 TRACE NetworkClient:123 - [Consumer 
> clientId=kafka-endpoint-be51569b-8795-4709-8ec8-28c9cd099a31-StreamThread-1-consumer,
>  groupId=kafka-endpoint] Found least loaded node cljp01.eb.lan.at:9093 (id: 1 
> rack: DC-1)
> 2017-11-23 23:54:52 DEBUG NetworkClient:183 - [Consumer 
> 

[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask

2017-11-28 Thread Seweryn Habdank-Wojewodzki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270341#comment-16270341
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-5882:
---

BTW the other bug: KAFKA-6260 happens also in the same environment.

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask

2017-11-28 Thread Seweryn Habdank-Wojewodzki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270338#comment-16270338
 ] 

Seweryn Habdank-Wojewodzki commented on KAFKA-5882:
---

[~mjsax] What do you mean by "DEBUG logs of Streams"? Which namespaces or 
classes shall I log in debug. Do you need:
{code}

  

{code}
Or something other?

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6257) KafkaConsumer was hung when bootstrap servers was not existed

2017-11-28 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau resolved KAFKA-6257.
-
Resolution: Duplicate

Closing this as duplicate since no contradicting information was added.

> KafkaConsumer was hung when bootstrap servers was not existed
> -
>
> Key: KAFKA-6257
> URL: https://issues.apache.org/jira/browse/KAFKA-6257
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Brian Clark
>Priority: Minor
>
> Could anyone help me on this?
> We have an issue if we entered an non-existed host:port for bootstrap.servers 
> property on KafkaConsumer. The created KafkaConsumer was hung forever.
> the debug message:
> java.net.ConnectException: Connection timed out: no further information
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:50)
> at 
> org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:95)
> at 
> org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:359)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:326)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:432)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:199)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:223)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:200)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:286)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
> [2017-08-28 09:20:56,400] DEBUG Node -1 disconnected. 
> (org.apache.kafka.clients.NetworkClient)
> [2017-08-28 09:20:56,400] WARN Connection to node -1 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2017-08-28 09:20:56,400] DEBUG Give up sending metadata request since no 
> node is available (org.apache.kafka.clients.NetworkClient)
> [2017-08-28 09:20:56,450] DEBUG Initialize connection to node -1 for sending 
> metadata request (org.apache.kafka.clients.NetworkClient)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6048) Support negative record timestamps

2017-11-28 Thread james chien (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270062#comment-16270062
 ] 

james chien commented on KAFKA-6048:


[~mjsax] ok!!

> Support negative record timestamps
> --
>
> Key: KAFKA-6048
> URL: https://issues.apache.org/jira/browse/KAFKA-6048
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: Konstantin Chukhlomin
>  Labels: needs-kip
>
> Kafka does not support negative record timestamps, and this prevents the 
> storage of historical data in Kafka. In general, negative timestamps are 
> supported by UNIX system time stamps: 
> From https://en.wikipedia.org/wiki/Unix_time
> {quote}
> The Unix time number is zero at the Unix epoch, and increases by exactly 
> 86,400 per day since the epoch. Thus 2004-09-16T00:00:00Z, 12,677 days after 
> the epoch, is represented by the Unix time number 12,677 × 86,400 = 
> 1095292800. This can be extended backwards from the epoch too, using negative 
> numbers; thus 1957-10-04T00:00:00Z, 4,472 days before the epoch, is 
> represented by the Unix time number −4,472 × 86,400 = −386380800.
> {quote}
> Allowing for negative timestamps would require multiple changes:
>  - while brokers in general do support negative timestamps, broker use {{-1}} 
> as default value if a producer uses an old message format (this would not be 
> compatible with supporting negative timestamps "end-to-end" as {{-1}} cannot 
> be used as "unknown" anymore): we could introduce a message flag indicating a 
> missing timestamp (and let producer throw an exception if 
> {{ConsumerRecord#timestamp()}} is called. Another possible solution might be, 
> to require topics that are used by old producers to be configured with 
> {{LogAppendTime}} semantics and rejecting writes to topics with 
> {{CreateTime}} semantics for older message formats
>  - {{KafkaProducer}} does not allow to send records with negative timestamp 
> and thus this would need to be fixed
>  - Streams API does drop records with negative timestamps (or fails by 
> default) -- also, some internal store implementation for windowed stores 
> assume that there are not negative timestamps to do range queries
> There might be other gaps we need to address. This is just a summary of issue 
> coming to my mind atm.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4827) Kafka connect: error with special characters in connector name

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270061#comment-16270061
 ] 

ASF GitHub Bot commented on KAFKA-4827:
---

GitHub user wicknicks opened a pull request:

https://github.com/apache/kafka/pull/4273

KAFKA-4827: Porting fix for KAFKA-4827 to v0.10 and v0.11



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wicknicks/kafka KAFKA-4827-0.10.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4273


commit c6e8dc1edb074a2b677de43031d8b59fec4a5e1e
Author: Arjun Satish 
Date:   2017-11-10T17:59:42Z

Correctly encode special chars while creating URI objects

Signed-off-by: Arjun Satish 

commit 02ddc84850acdc5dc55dc146e48744be2346a929
Author: Arjun Satish 
Date:   2017-11-10T23:48:13Z

Replace URLEncoder with URIBuilder

Also, update the tests to pass some additional characters in the connector
name along with adding a decode step using the URI class.

Signed-off-by: Arjun Satish 

commit a1cd574cf1987e59cb89fc33da12aa1ced8bf9f5
Author: Arjun Satish 
Date:   2017-11-28T22:33:16Z

Porting fix for KAFKA-4827 to v0.10 and v0.11

Signed-off-by: Arjun Satish 




> Kafka connect: error with special characters in connector name
> --
>
> Key: KAFKA-4827
> URL: https://issues.apache.org/jira/browse/KAFKA-4827
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.0
>Reporter: Aymeric Bouvet
>Assignee: Arjun Satish
>Priority: Minor
> Fix For: 1.1.0, 1.0.1
>
>
> When creating a connector, if the connector name (and possibly other 
> properties) end with a carriage return, kafka-connect will create the config 
> but report error
> {code}
> cat << EOF > file-connector.json
> {
>   "name": "file-connector\r",
>   "config": {
> "topic": "kafka-connect-logs\r",
> "tasks.max": "1",
> "file": "/var/log/ansible-confluent/connect.log",
> "connector.class": 
> "org.apache.kafka.connect.file.FileStreamSourceConnector"
>   }
> }
> EOF
> curl -X POST -H "Content-Type: application/json" -H "Accept: 
> application/json" -d @file-connector.json localhost:8083/connectors 
> {code}
> returns an error 500  and log the following
> {code}
> [2017-03-01 18:25:23,895] WARN  (org.eclipse.jetty.servlet.ServletHandler)
> javax.servlet.ServletException: java.lang.IllegalArgumentException: Illegal 
> character in path at index 27: /connectors/file-connector4
> at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
> at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
> at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
> at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at 
> org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:159)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:499)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at 
> 

[jira] [Updated] (KAFKA-6271) FileInputStream.skip function can return 0 when the file is corrupted, causing an infinite loop

2017-11-28 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-6271:
---
Component/s: KafkaConnect

> FileInputStream.skip function can return 0 when the file is corrupted, 
> causing an infinite loop
> ---
>
> Key: KAFKA-6271
> URL: https://issues.apache.org/jira/browse/KAFKA-6271
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Dustin
>
> When file is corrupted, the FileInputStream can return 0, causing the while 
> loop in FileStreamSourceTask.poll() become infinite.
> {code:java}
>public List poll() throws InterruptedException {
>...
>stream = new FileInputStream(filename);
>long skipLeft = (Long) lastRecordedOffset;
>while (skipLeft > 0) {
>  try {
>long skipped = stream.skip(skipLeft);
>skipLeft -= skipped;
>  } catch (IOException e) {
> log.error("Error while trying to seek to previous offset in file: 
> ", e);
> throw new ConnectException(e);
>  }
>}
> }
> {code}
> Similar bugs are like 
> [Cassandra-7330|https://issues.apache.org/jira/browse/Cassandra-7330], 
> [Hadoop-8614|https://issues.apache.org/jira/browse/Hadoop-8614], 
> [Mapreduce-6990|https://issues.apache.org/jira/browse/MAPREDUCE-6990], 
> [Yarn-163|https://issues.apache.org/jira/browse/yarn-163], 
> [Yarn-2905|https://issues.apache.org/jira/browse/yarn-2905] etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6252) A metric named 'XX' already exists, can't register another one.

2017-11-28 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-6252:
-
Priority: Critical  (was: Major)

> A metric named 'XX' already exists, can't register another one.
> ---
>
> Key: KAFKA-6252
> URL: https://issues.apache.org/jira/browse/KAFKA-6252
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Alexis Sellier
>Priority: Critical
>
> When a connector crashes, It cannot be restarted and an exception like this 
> is thrown 
> {code:java}
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=offset-commit-max-time-ms, group=connector-task-metrics, 
> description=The maximum time in milliseconds taken by this task to commit 
> offsets., tags={connector=hdfs-sink-connector-recover, task=0}]' already 
> exists, can't register another one.
>   at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:532)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:256)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:241)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask$TaskMetricsGroup.(WorkerTask.java:328)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask.(WorkerTask.java:69)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.(WorkerSinkTask.java:98)
>   at 
> org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:449)
>   at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:404)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:852)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:108)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:866)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:862)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I guess it's because the function taskMetricsGroup.close is not call in all 
> the cases



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6252) A metric named 'XX' already exists, can't register another one.

2017-11-28 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269781#comment-16269781
 ] 

Randall Hauch commented on KAFKA-6252:
--

This also happens when a connector does not properly implement {{stop()}}. 

Connect will call {{stop()}} on a source task to signal that it should "_stop 
trying to poll for new data 
and interrupt any outstanding poll() requests_" (see the 
[JavaDoc|http://kafka.apache.org/10/javadoc/org/apache/kafka/connect/source/SourceTask.html#stop--]
 for this method).

Unfortunately, not all connectors properly adhere to this expectation. Since 
the metrics for the source task are cleaned up only when the worker source 
task's thread completes, a task whose {{poll()}} method blocks forever will 
current prevent its thread from completing. So, we need to change how the 
metrics are cleaned up to ensure this always happens.

> A metric named 'XX' already exists, can't register another one.
> ---
>
> Key: KAFKA-6252
> URL: https://issues.apache.org/jira/browse/KAFKA-6252
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Alexis Sellier
>
> When a connector crashes, It cannot be restarted and an exception like this 
> is thrown 
> {code:java}
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=offset-commit-max-time-ms, group=connector-task-metrics, 
> description=The maximum time in milliseconds taken by this task to commit 
> offsets., tags={connector=hdfs-sink-connector-recover, task=0}]' already 
> exists, can't register another one.
>   at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:532)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:256)
>   at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:241)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask$TaskMetricsGroup.(WorkerTask.java:328)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask.(WorkerTask.java:69)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.(WorkerSinkTask.java:98)
>   at 
> org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:449)
>   at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:404)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:852)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:108)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:866)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$13.call(DistributedHerder.java:862)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I guess it's because the function taskMetricsGroup.close is not call in all 
> the cases



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-6275) Extend consumer offset reset tool to support deletion

2017-11-28 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-6275:
---
Labels: kip  (was: needs-kip)

> Extend consumer offset reset tool to support deletion
> -
>
> Key: KAFKA-6275
> URL: https://issues.apache.org/jira/browse/KAFKA-6275
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>  Labels: kip
>
> It's useful to have a way to delete the offsets of a consumer group 
> explicitly. The reset tool already supports a number of different ways to 
> alter stored offsets, so perhaps we could add a {{--clear}} option. Note that 
> this would require a change to the OffsetCommit protocol which does not 
> currently support deletion. Perhaps if you commit an offset with a retention 
> time of 0, we can treat it as a deletion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6279) Connect metrics do not get cleaned up for a source connector that doesn't stop properly

2017-11-28 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-6279.
--
Resolution: Duplicate

> Connect metrics do not get cleaned up for a source connector that doesn't 
> stop properly
> ---
>
> Key: KAFKA-6279
> URL: https://issues.apache.org/jira/browse/KAFKA-6279
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>
> Connect will call {{stop()}} on a source task to signal that it should "_stop 
> trying to poll for new data 
> and interrupt any outstanding poll() requests_" (see the 
> [JavaDoc|http://kafka.apache.org/10/javadoc/org/apache/kafka/connect/source/SourceTask.html#stop--]
>  for this method).
> Unfortunately, not all connectors properly adhere to this expectation. Since 
> the metrics for the source task are cleaned up only when the worker source 
> task's thread completes, a task whose {{poll()}} method blocks forever will 
> current prevent its thread from completing. So, we need to change how the 
> metrics are cleaned up to ensure this always happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6279) Connect metrics do not get cleaned up for a source connector that doesn't stop properly

2017-11-28 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269776#comment-16269776
 ] 

Randall Hauch commented on KAFKA-6279:
--

This was recently reported as KAFKA-6252, so marking this as a duplicate.

> Connect metrics do not get cleaned up for a source connector that doesn't 
> stop properly
> ---
>
> Key: KAFKA-6279
> URL: https://issues.apache.org/jira/browse/KAFKA-6279
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>
> Connect will call {{stop()}} on a source task to signal that it should "_stop 
> trying to poll for new data 
> and interrupt any outstanding poll() requests_" (see the 
> [JavaDoc|http://kafka.apache.org/10/javadoc/org/apache/kafka/connect/source/SourceTask.html#stop--]
>  for this method).
> Unfortunately, not all connectors properly adhere to this expectation. Since 
> the metrics for the source task are cleaned up only when the worker source 
> task's thread completes, a task whose {{poll()}} method blocks forever will 
> current prevent its thread from completing. So, we need to change how the 
> metrics are cleaned up to ensure this always happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6278) Allow multiple concurrent transactions on a single producer

2017-11-28 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269597#comment-16269597
 ] 

Jason Gustafson commented on KAFKA-6278:


Thanks for the report. This was an unfortunate tradeoff in the design of the 
transactional producer, but it simplified the API and the implementation 
considerably. One of the main benefits to sharing the producer instances is 
potentially better batching. However, the message batch format does not support 
messages from separate transactions, so we apparently lose some of that benefit 
even if the API supported it. There is still some potential benefit in sharing 
memory and network sockets. We have toyed with the idea of having a 
{{ProducerPool}} which could be used to get {{KafkaProducer}} instances which 
share the same underlying memory pool and network resources. That way, we 
wouldn't need to make any changes to the existing API.

> Allow multiple concurrent transactions on a single producer
> ---
>
> Key: KAFKA-6278
> URL: https://issues.apache.org/jira/browse/KAFKA-6278
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Tim Cuthbertson
>
> It's recommended to share a producer between threads, because it's likely 
> faster / cheaper.
> However with the transactional API there's a big caveat. If you're using 
> transactions, every message sent to a given producer instance will be 
> considered part of the "active transaction" regardless of what thread it came 
> from. Furthermore, if two threads want to use transactions on a shared 
> producer instance, it (probably) won't work.
> Possible fix: add an API which exposes the transaction ID to the user, 
> instead of making it internal state of the producer. e.g.:
> {noformat}
> Transaction tx = producer.beginTransaction()
> producer.send(tx, message)
> producer.commitTransaction(tx)
> {noformat}
> That way, it's explicit which transaction a message will be part of, rather 
> than the current state which is "the open transaction, which may have been 
> opened by an unrelated thread".
> See also initial discussion on slack: 
> https://confluentcommunity.slack.com/archives/C488525JT/p151173973412



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KAFKA-6118) Transient failure in kafka.api.SaslScramSslEndToEndAuthorizationTest.testTwoConsumersWithDifferentSaslCredentials

2017-11-28 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269578#comment-16269578
 ] 

Vahid Hashemian edited comment on KAFKA-6118 at 11/28/17 10:19 PM:
---

I also hit this today with one of my PRs (JDK 9 and Scala 2.12): 
[link|https://pastebin.com/yBiDVu9F]


was (Author: vahid):
I also hit this today with one of my PRs: [link|https://pastebin.com/yBiDVu9F]

> Transient failure in 
> kafka.api.SaslScramSslEndToEndAuthorizationTest.testTwoConsumersWithDifferentSaslCredentials
> -
>
> Key: KAFKA-6118
> URL: https://issues.apache.org/jira/browse/KAFKA-6118
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core, unit tests
>Affects Versions: 1.0.0
>Reporter: Guozhang Wang
>
> Saw this failure on trunk jenkins job:
> https://builds.apache.org/job/kafka-pr-jdk9-scala2.12/2274/testReport/junit/kafka.api/SaslScramSslEndToEndAuthorizationTest/testTwoConsumersWithDifferentSaslCredentials/
> {code}
> Stacktrace
> org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to 
> access group: group
> Standard Output
> [2017-10-25 15:09:49,986] ERROR ZKShutdownHandler is not registered, so 
> ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
> changes (org.apache.zookeeper.server.ZooKeeperServer:472)
> Adding ACLs for resource `Cluster:kafka-cluster`: 
>   User:scram-admin has Allow permission for operations: ClusterAction 
> from hosts: * 
> Current ACLs for resource `Cluster:kafka-cluster`: 
>   User:scram-admin has Allow permission for operations: ClusterAction 
> from hosts: * 
> Completed Updating config for entity: user-principal 'scram-admin'.
> [2017-10-25 15:09:50,654] ERROR [ReplicaFetcher replicaId=0, leaderId=2, 
> fetcherId=0] Error for partition __consumer_offsets-0 from broker 2 
> (kafka.server.ReplicaFetcherThread:107)
> org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
> access topics: [Topic authorization failed.]
> [2017-10-25 15:09:50,654] ERROR [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=0] Error for partition __consumer_offsets-0 from broker 2 
> (kafka.server.ReplicaFetcherThread:107)
> org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
> access topics: [Topic authorization failed.]
> Adding ACLs for resource `Topic:*`: 
>   User:scram-admin has Allow permission for operations: Read from hosts: 
> * 
> Current ACLs for resource `Topic:*`: 
>   User:scram-admin has Allow permission for operations: Read from hosts: 
> * 
> Completed Updating config for entity: user-principal 'scram-user'.
> Completed Updating config for entity: user-principal 'scram-user2'.
> Adding ACLs for resource `Topic:e2etopic`: 
>   User:scram-user has Allow permission for operations: Write from hosts: *
>   User:scram-user has Allow permission for operations: Describe from 
> hosts: * 
> Adding ACLs for resource `Cluster:kafka-cluster`: 
>   User:scram-user has Allow permission for operations: Create from hosts: 
> * 
> Current ACLs for resource `Topic:e2etopic`: 
>   User:scram-user has Allow permission for operations: Write from hosts: *
>   User:scram-user has Allow permission for operations: Describe from 
> hosts: * 
> Adding ACLs for resource `Topic:e2etopic`: 
>   User:scram-user has Allow permission for operations: Read from hosts: *
>   User:scram-user has Allow permission for operations: Describe from 
> hosts: * 
> Adding ACLs for resource `Group:group`: 
>   User:scram-user has Allow permission for operations: Read from hosts: * 
> Current ACLs for resource `Topic:e2etopic`: 
>   User:scram-user has Allow permission for operations: Write from hosts: *
>   User:scram-user has Allow permission for operations: Describe from 
> hosts: *
>   User:scram-user has Allow permission for operations: Read from hosts: * 
> Current ACLs for resource `Group:group`: 
>   User:scram-user has Allow permission for operations: Read from hosts: * 
> [2017-10-25 15:09:52,788] ERROR Error while creating ephemeral at /controller 
> with return code: OK 
> (kafka.controller.KafkaControllerZkUtils$CheckedEphemeral:101)
> [2017-10-25 15:09:54,078] ERROR ZKShutdownHandler is not registered, so 
> ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
> changes (org.apache.zookeeper.server.ZooKeeperServer:472)
> [2017-10-25 15:09:54,112] ERROR ZKShutdownHandler is not registered, so 
> ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
> changes (org.apache.zookeeper.server.ZooKeeperServer:472)
> Adding ACLs for resource `Cluster:kafka-cluster`: 
>   User:scram-admin has Allow permission for 

[jira] [Commented] (KAFKA-6255) Add ProduceBench to Trogdor

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269569#comment-16269569
 ] 

ASF GitHub Bot commented on KAFKA-6255:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4245


> Add ProduceBench to Trogdor
> ---
>
> Key: KAFKA-6255
> URL: https://issues.apache.org/jira/browse/KAFKA-6255
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 1.1.0
>
>
> Add ProduceBench, a benchmark of producer latency, to Trogdor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6255) Add ProduceBench to Trogdor

2017-11-28 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-6255.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 4245
[https://github.com/apache/kafka/pull/4245]

> Add ProduceBench to Trogdor
> ---
>
> Key: KAFKA-6255
> URL: https://issues.apache.org/jira/browse/KAFKA-6255
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 1.1.0
>
>
> Add ProduceBench, a benchmark of producer latency, to Trogdor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6280) Allow for additional archive types to be loaded from plugin.path in Connect

2017-11-28 Thread Konstantine Karantasis (JIRA)
Konstantine Karantasis created KAFKA-6280:
-

 Summary: Allow for additional archive types to be loaded from 
plugin.path in Connect
 Key: KAFKA-6280
 URL: https://issues.apache.org/jira/browse/KAFKA-6280
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 1.0.0
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis
Priority: Minor
 Fix For: 1.1.0


Additionally to uber {{.jar}} archives, seems it would be nice if one could 
load also `zip` archives of appropriately packaged Connect plugins. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5526) KIP-175: ConsumerGroupCommand no longer shows output for consumer groups which have not committed offsets

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269410#comment-16269410
 ] 

ASF GitHub Bot commented on KAFKA-5526:
---

GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/4271

KAFKA-5526: Additional `--describe` views for ConsumerGroupCommand (KIP-175)

The `--describe` option of ConsumerGroupCommand is expanded to support:
* `--describe` or `--describe --offsets`: listing of current group offsets
* `--describe --members` or `--describe --members --verbose`: listing of 
group members
* `--describe --state`: group status

Example: With a single partition topic `test1` and a double partition topic 
`test2`, consumers `consumer1` and `consumer11` subscribed to `test`, consumers 
`consumer2` and `consumer22` and `consumer222` subscribed to `test2`, and all 
consumers belonging to group `test-group`, this is an output example of the new 
options above for `test-group`:

```
--describe, or --describe --offsets:

TOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG 
CONSUMER-ID HOSTCLIENT-ID
test2   0  0   0   0   
consumer2-bad9496d-0889-47ab-98ff-af17d9460382  /127.0.0.1  consumer2
test2   1  0   0   0   
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411 /127.0.0.1  consumer22
test1   0  0   0   0   
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf  /127.0.0.1  consumer1
```

```
--describe --members

CONSUMER-ID  HOSTCLIENT-ID  
 #PARTITIONS
consumer2-bad9496d-0889-47ab-98ff-af17d9460382   /127.0.0.1  consumer2  
 1
consumer222-ed2108cd-d368-41f1-8514-5b72aa835bcc /127.0.0.1  
consumer222 0
consumer11-dc8295d7-8f3f-4438-9b11-7270bab46760  /127.0.0.1  consumer11 
 0
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411  /127.0.0.1  consumer22 
 1
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf   /127.0.0.1  consumer1  
 1
```

```
--describe --members --verbose

CONSUMER-ID  HOSTCLIENT-ID  
 #PARTITIONS ASSIGNMENT
consumer2-bad9496d-0889-47ab-98ff-af17d9460382   /127.0.0.1  consumer2  
 1   test2(0)
consumer222-ed2108cd-d368-41f1-8514-5b72aa835bcc /127.0.0.1  
consumer222 0   -
consumer11-dc8295d7-8f3f-4438-9b11-7270bab46760  /127.0.0.1  consumer11 
 0   -
consumer22-c45e6ee2-0c7d-44a3-94a8-9627f63fb411  /127.0.0.1  consumer22 
 1   test2(1)
consumer1-d51b0345-3194-4305-80db-81a68fa6c5bf   /127.0.0.1  consumer1  
 1   test1(0)
```

```
--describe --state

ASSIGNMENT-STRATEGY   STATE#MEMBERS
range Stable   5
```

Note that this PR also addresses the issue reported in 
[KAFKA-6158](https://issues.apache.org/jira/browse/KAFKA-6158) by dynamically 
setting the width of columns `TOPIC`, `CONSUMER-ID`, `HOST` and `CLIENT-ID`. 
This avoid truncation of column values when they go over the current fixed 
width of these columns.

The code has been restructured to better support testing of individual 
values and also the console output. Unit tests have been updated and extended 
to take advantage of this restructuring.

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-5526

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4271


commit 1973db23d2f29191ec56b56a3040c1a2b0c00ef4
Author: Vahid Hashemian 
Date:   2017-11-28T20:08:37Z

KAFKA-5526: Additional `--describe` views for ConsumerGroupCommand (KIP-175)

The `--describe` option of ConsumerGroupCommand is expanded to support:
* `--describe` or `--describe --offsets`: listing of current group offsets
* `--describe --members` or `--describe --members --verbose`: listing of 
group members
* `--describe --state`: group status

Example: With a single partition topic `test1` and a double partition topic 
`test2`, consumers `consumer1` and 

[jira] [Commented] (KAFKA-5526) KIP-175: ConsumerGroupCommand no longer shows output for consumer groups which have not committed offsets

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269369#comment-16269369
 ] 

ASF GitHub Bot commented on KAFKA-5526:
---

Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/3448


> KIP-175: ConsumerGroupCommand no longer shows output for consumer groups 
> which have not committed offsets
> -
>
> Key: KAFKA-5526
> URL: https://issues.apache.org/jira/browse/KAFKA-5526
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ryan P
>Assignee: Vahid Hashemian
>  Labels: kip
>
> It would appear that the latest iteration of the ConsumerGroupCommand no 
> longer outputs information about group membership when no offsets have been 
> committed. It would be nice if the output generated by these tools maintained 
> some form of consistency across versions as some users have grown to depend 
> on them. 
> 0.9.x output:
> bin/kafka-consumer-groups --bootstrap-server localhost:9092 --new-consumer 
> --describe --group console-consumer-34885
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> console-consumer-34885, test, 0, unknown, 0, unknown, consumer-1_/192.168.1.64
> 0.10.2 output:
> bin/kafka-consumer-groups --bootstrap-server localhost:9092 --new-consumer 
> --describe --group console-consumer-34885
> Note: This will only show information about consumers that use the Java 
> consumer API (non-ZooKeeper-based consumers).
> TOPIC  PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG 
>CONSUMER-ID   HOST 
>   CLIENT-ID



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KAFKA-5154) Kafka Streams throws NPE during rebalance

2017-11-28 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-5154:
-
Fix Version/s: 0.11.0.0

> Kafka Streams throws NPE during rebalance
> -
>
> Key: KAFKA-5154
> URL: https://issues.apache.org/jira/browse/KAFKA-5154
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Lukas Gemela
>Assignee: Damian Guy
> Fix For: 0.11.0.0
>
> Attachments: 5154_problem.log, clio.txt.gz, clio_afa596e9b809.gz, 
> clio_reduced.gz
>
>
> please see attached log, Kafka streams throws NullPointerException during 
> rebalance, which is caught by our custom exception handler
> {noformat}
> 2017-04-30T17:44:17,675 INFO  kafka-coordinator-heartbeat-thread | hades 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead()
>  @618 - Marking the coordinator 10.210.200.144:9092 (id: 2147483644 rack: 
> null) dead for group hades
> 2017-04-30T17:44:27,395 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @573 - Discovered coordinator 10.210.200.144:9092 (id: 2147483644 rack: null) 
> for group hades.
> 2017-04-30T17:44:27,941 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare()
>  @393 - Revoking previously assigned partitions [poseidonIncidentFeed-27, 
> poseidonIncidentFeed-29, poseidonIncidentFeed-30, poseidonIncidentFeed-18] 
> for group hades
> 2017-04-30T17:44:27,947 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:44:48,468 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:44:53,628 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:45:09,587 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:45:11,961 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @375 - Successfully joined group hades with generation 99
> 2017-04-30T17:45:13,126 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete()
>  @252 - Setting newly assigned partitions [poseidonIncidentFeed-11, 
> poseidonIncidentFeed-27, poseidonIncidentFeed-25, poseidonIncidentFeed-29, 
> poseidonIncidentFeed-19, poseidonIncidentFeed-18] for group hades
> 2017-04-30T17:46:37,254 INFO  kafka-coordinator-heartbeat-thread | hades 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead()
>  @618 - Marking the coordinator 10.210.200.144:9092 (id: 2147483644 rack: 
> null) dead for group hades
> 2017-04-30T18:04:25,993 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @573 - Discovered coordinator 10.210.200.144:9092 (id: 2147483644 rack: null) 
> for group hades.
> 2017-04-30T18:04:29,401 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare()
>  @393 - Revoking previously assigned partitions [poseidonIncidentFeed-11, 
> poseidonIncidentFeed-27, poseidonIncidentFeed-25, poseidonIncidentFeed-29, 
> poseidonIncidentFeed-19, poseidonIncidentFeed-18] for group hades
> 2017-04-30T18:05:10,877 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-05-01T00:01:55,707 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead()
>  @618 - Marking the coordinator 10.210.200.144:9092 (id: 2147483644 rack: 
> null) dead for group hades
> 2017-05-01T00:01:59,027 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @573 - Discovered coordinator 10.210.200.144:9092 (id: 2147483644 rack: null) 
> for group hades.
> 2017-05-01T00:01:59,031 ERROR StreamThread-1 
> org.apache.kafka.streams.processor.internals.StreamThread.run() @376 - 
> stream-thread [StreamThread-1] Streams application error during processing:
>  java.lang.NullPointerException
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:619)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>  [kafka-streams-0.10.2.0.jar!/:?]
> 2017-05-01T00:02:00,038 INFO  StreamThread-1 
> 

[jira] [Updated] (KAFKA-6279) Connect metrics do not get cleaned up for a source connector that doesn't stop properly

2017-11-28 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-6279:
-
Priority: Critical  (was: Major)

> Connect metrics do not get cleaned up for a source connector that doesn't 
> stop properly
> ---
>
> Key: KAFKA-6279
> URL: https://issues.apache.org/jira/browse/KAFKA-6279
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Critical
>
> Connect will call {{stop()}} on a source task to signal that it should "_stop 
> trying to poll for new data 
> and interrupt any outstanding poll() requests_" (see the 
> [JavaDoc|http://kafka.apache.org/10/javadoc/org/apache/kafka/connect/source/SourceTask.html#stop--]
>  for this method).
> Unfortunately, not all connectors properly adhere to this expectation. Since 
> the metrics for the source task are cleaned up only when the worker source 
> task's thread completes, a task whose {{poll()}} method blocks forever will 
> current prevent its thread from completing. So, we need to change how the 
> metrics are cleaned up to ensure this always happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6274) Improve KTable Source state store auto-generated names

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269295#comment-16269295
 ] 

ASF GitHub Bot commented on KAFKA-6274:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4268


> Improve KTable Source state store auto-generated names
> --
>
> Key: KAFKA-6274
> URL: https://issues.apache.org/jira/browse/KAFKA-6274
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 1.1.0, 1.0.1
>
>
> When the source KTable is generated without the store name specified, the 
> auto-generated store name use {{topic}} as the store name prefix. This would 
> generate the store name as
> {code}
> Processor: KTABLE-SOURCE-31 (stores: 
> [windowed-node-countsSTATE-STORE-29])
>   --> none
>   <-- KSTREAM-SOURCE-30
> {code}
> We'd better improve the auto-generated store name as 
> {{[topic-name]-STATE-STORE-suffix}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6274) Improve KTable Source state store auto-generated names

2017-11-28 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-6274.
--
   Resolution: Fixed
Fix Version/s: 1.0.1
   1.1.0

Issue resolved by pull request 4268
[https://github.com/apache/kafka/pull/4268]

> Improve KTable Source state store auto-generated names
> --
>
> Key: KAFKA-6274
> URL: https://issues.apache.org/jira/browse/KAFKA-6274
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 1.1.0, 1.0.1
>
>
> When the source KTable is generated without the store name specified, the 
> auto-generated store name use {{topic}} as the store name prefix. This would 
> generate the store name as
> {code}
> Processor: KTABLE-SOURCE-31 (stores: 
> [windowed-node-countsSTATE-STORE-29])
>   --> none
>   <-- KSTREAM-SOURCE-30
> {code}
> We'd better improve the auto-generated store name as 
> {{[topic-name]-STATE-STORE-suffix}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4669) KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws exception

2017-11-28 Thread Nick Travers (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269266#comment-16269266
 ] 

Nick Travers commented on KAFKA-4669:
-

We hit this again in production, running 0.11.0.1. Same symptoms as previously 
reported:

{code}
2017-11-28 09:18:20,674 apa158.sjc2b.square kafka-producer-network-thread | 
producer-2 Uncaught error in kafka producer I/O thread: 
java.lang.IllegalStateException: Correlation id for response (1835513) does not 
match request (1835503), request header: 
{api_key=0,api_version=3,correlation_id=1835503,client_id=producer-2}
at 
org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:752)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:561)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:748)
{code}

A side-effect was that this effectively caused a deadlock in the affected JVM 
as it had a thread waiting for the completion of a send (which awaits on a 
latch), but this could never occur as the I/O thread had presumably crashed:

{code}
"async-message-sender-0" #1761 daemon prio=5 os_prio=0 tid=0x7f3f04006800 
nid=0x4356a waiting on condition [0x7f3de9425000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00075a5e9140> (a 
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at 
org.apache.kafka.clients.producer.internals.ProduceRequestResult.await(ProduceRequestResult.java:76)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:61)
at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
at com.squareup.core.concurrent.Futures2.getAll(Futures2.java:462)
at 
com.squareup.kafka.ng.producer.KafkaProducer.sendMessageBatch(KafkaProducer.java:214)
- locked <0x000734fc4dc0> (a 
com.squareup.kafka.ng.producer.KafkaProducer)
at 
com.squareup.kafka.ng.producer.BufferedKafkaProducer$AsyncBatchMessageSender.sendBatch(BufferedKafkaProducer.java:585)
at 
com.squareup.kafka.ng.producer.BufferedKafkaProducer$AsyncBatchMessageSender.run(BufferedKafkaProducer.java:536)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

This doesn't look like an easy one to reproduce on our side, so I'm wondering 
what the best course of action is here. Is it worth opening this ticket 
[~ijuma]?

> KafkaProducer.flush hangs when NetworkClient.handleCompletedReceives throws 
> exception
> -
>
> Key: KAFKA-4669
> URL: https://issues.apache.org/jira/browse/KAFKA-4669
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
>Reporter: Cheng Ju
>Assignee: Rajini Sivaram
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1, 1.0.0
>
>
> There is no try catch in NetworkClient.handleCompletedReceives.  If an 
> exception is thrown after inFlightRequests.completeNext(source), then the 
> corresponding RecordBatch's done will never get called, and 
> KafkaProducer.flush will hang on this RecordBatch.
> I've checked 0.10 code and think this bug does exist in 0.10 versions.
> A real case.  First a correlateId not match exception happens:
> 13 Jan 2017 17:08:24,059 ERROR [kafka-producer-network-thread | producer-21] 
> (org.apache.kafka.clients.producer.internals.Sender.run:130)  - Uncaught 
> error in kafka producer I/O thread: 
> java.lang.IllegalStateException: Correlation id for response (703766) does 
> not match request (703764)
>   at 
> 

[jira] [Commented] (KAFKA-6150) Make Repartition Topics Transient

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269162#comment-16269162
 ] 

ASF GitHub Bot commented on KAFKA-6150:
---

GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/4270

[WIP] KAFKA-6150: Purge repartition topics

*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*

*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
K6150-purge-repartition-topics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4270






> Make Repartition Topics Transient
> -
>
> Key: KAFKA-6150
> URL: https://issues.apache.org/jira/browse/KAFKA-6150
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>  Labels: operability
>
> Unlike changelog topics, the repartition topics could just be short-lived. 
> Today users have different ways to configure them with short retention such 
> as enforce a short retention period or use AppendTime for repartition topics. 
> All these would be cumbersome and Streams should just do this for the users.
> One way to do it is use the “purgeData” admin API (KIP-107) such that after 
> the offset of the input topics are committed, if the input topics are 
> actually repartition topics, we would purge the data immediately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6279) Connect metrics do not get cleaned up for a source connector that doesn't stop properly

2017-11-28 Thread Randall Hauch (JIRA)
Randall Hauch created KAFKA-6279:


 Summary: Connect metrics do not get cleaned up for a source 
connector that doesn't stop properly
 Key: KAFKA-6279
 URL: https://issues.apache.org/jira/browse/KAFKA-6279
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 1.0.0
Reporter: Randall Hauch
Assignee: Randall Hauch


Connect will call {{stop()}} on a source task to signal that it should "_stop 
trying to poll for new data 
and interrupt any outstanding poll() requests_" (see the 
[JavaDoc|http://kafka.apache.org/10/javadoc/org/apache/kafka/connect/source/SourceTask.html#stop--]
 for this method).

Unfortunately, not all connectors properly adhere to this expectation. Since 
the metrics for the source task are cleaned up only when the worker source 
task's thread completes, a task whose {{poll()}} method blocks forever will 
current prevent its thread from completing. So, we need to change how the 
metrics are cleaned up to ensure this always happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6170) Add the AdminClient in Streams' KafkaClientSupplier

2017-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269121#comment-16269121
 ] 

ASF GitHub Bot commented on KAFKA-6170:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4224


> Add the AdminClient in Streams' KafkaClientSupplier
> ---
>
> Key: KAFKA-6170
> URL: https://issues.apache.org/jira/browse/KAFKA-6170
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 1.1.0
>
>
> We will add Java AdminClient to Kafka Streams, in order to replace the 
> internal StreamsKafkaClient. More details can be found in KIP-220 
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6269) KTable state restore fails after rebalance

2017-11-28 Thread Andreas Schroeder (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269058#comment-16269058
 ] 

Andreas Schroeder commented on KAFKA-6269:
--

Hi again [~mjsax], I'm back with some news (finally): The issue we are having 
is that Records with null value are ignored. So deletes won't propagate to the 
outer join, so that our business logic doesn't work any more.

See the [KGroupedStream API 
docs|http://apache.mirror.digionline.de/kafka/1.0.0/javadoc/org/apache/kafka/streams/kstream/KGroupedStream.html#aggregate(org.apache.kafka.streams.kstream.Initializer,%20org.apache.kafka.streams.kstream.Aggregator,%20org.apache.kafka.streams.kstream.Materialized)]

Any other ideas? :)

> KTable state restore fails after rebalance
> --
>
> Key: KAFKA-6269
> URL: https://issues.apache.org/jira/browse/KAFKA-6269
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Andreas Schroeder
>Priority: Blocker
> Fix For: 1.1.0, 1.0.1
>
>
> I have the following kafka streams topology:
> entity-B -> map step -> entity-B-exists (with state store)
> entity-A   -> map step -> entity-A-exists (with state store)
> (entity-B-exists, entity-A-exists) -> outer join with state store.
> The topology building code looks like this (some data type, serde, 
> valuemapper, and joiner code omitted):
> {code}
> def buildTable[V](builder: StreamsBuilder,
>   sourceTopic: String,
>   existsTopic: String,
>   valueSerde: Serde[V],
>   valueMapper: ValueMapper[String, V]): 
> KTable[String, V] = {
>   val stream: KStream[String, String] = builder.stream[String, 
> String](sourceTopic)
>   val transformed: KStream[String, V] = stream.mapValues(valueMapper)
>   transformed.to(existsTopic, Produced.`with`(Serdes.String(), valueSerde))
>   val inMemoryStoreName = s"$existsTopic-persisted"
>   val materialized = 
> Materialized.as(Stores.inMemoryKeyValueStore(inMemoryStoreName))
>   .withKeySerde(Serdes.String())
>   .withValueSerde(valueSerde)
>   .withLoggingDisabled()
>   builder.table(existsTopic, materialized)
> }
> val builder = new StreamsBuilder
> val mapToEmptyString: ValueMapper[String, String] = (value: String) => if 
> (value != null) "" else null
> val entitiesB: KTable[String, EntityBInfo] =
>   buildTable(builder,
>  "entity-B",
>  "entity-B-exists",
>  EntityBInfoSerde,
>  ListingImagesToEntityBInfo)
> val entitiesA: KTable[String, String] =
>   buildTable(builder, "entity-A", "entity-A-exists", Serdes.String(), 
> mapToEmptyString)
> val joiner: ValueJoiner[String, EntityBInfo, EntityDiff] = (a, b) => 
> EntityDiff.fromJoin(a, b)
> val materialized = 
> Materialized.as(Stores.inMemoryKeyValueStore("entity-A-joined-with-entity-B"))
>   .withKeySerde(Serdes.String())
>   .withValueSerde(EntityDiffSerde)
>   .withLoggingEnabled(new java.util.HashMap[String, String]())
> val joined: KTable[String, EntityDiff] = entitiesA.outerJoin(entitiesB, 
> joiner, materialized)
> {code}
> We run 4 processor machines with 30 stream threads each; each topic has 30 
> partitions so that there is a total of 4 x 30 = 120 partitions to consume. 
> The initial launch of the processor works fine, but when killing one 
> processor and letting him re-join the stream threads leads to some faulty 
> behaviour.
> Fist, the total number of assigned partitions over all processor machines is 
> larger than 120 (sometimes 157, sometimes just 132), so the partition / task 
> assignment seems to assign the same job to different stream threads.
> The processor machines trying to re-join the consumer group fail constantly 
> with the error message of 'Detected a task that got migrated to another 
> thread.' We gave the processor half an hour to recover; usually, rebuilding 
> the KTable states take around 20 seconds (with Kafka 0.11.0.1).
> Here are the details of the errors we see:
> stream-thread [kafka-processor-6-StreamThread-9] Detected a task that got 
> migrated to another thread. This implies that this thread missed a rebalance 
> and dropped out of the consumer group. Trying to rejoin the consumer group 
> now.
> {code}
> org.apache.kafka.streams.errors.TaskMigratedException: Log end offset of 
> entity-B-exists-0 should not change while restoring: old end offset 4750539, 
> current offset 4751388
> > StreamsTask taskId: 1_0
> > >   ProcessorTopology:
> > KSTREAM-SOURCE-08:
> > topics: [entity-A-exists]
> > children:   [KTABLE-SOURCE-09]
> > KTABLE-SOURCE-09:
> > states: [entity-A-exists-persisted]
> >   

[jira] [Assigned] (KAFKA-6048) Support negative record timestamps

2017-11-28 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-6048:
--

Assignee: Konstantin Chukhlomin  (was: james chien)

> Support negative record timestamps
> --
>
> Key: KAFKA-6048
> URL: https://issues.apache.org/jira/browse/KAFKA-6048
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: Konstantin Chukhlomin
>  Labels: needs-kip
>
> Kafka does not support negative record timestamps, and this prevents the 
> storage of historical data in Kafka. In general, negative timestamps are 
> supported by UNIX system time stamps: 
> From https://en.wikipedia.org/wiki/Unix_time
> {quote}
> The Unix time number is zero at the Unix epoch, and increases by exactly 
> 86,400 per day since the epoch. Thus 2004-09-16T00:00:00Z, 12,677 days after 
> the epoch, is represented by the Unix time number 12,677 × 86,400 = 
> 1095292800. This can be extended backwards from the epoch too, using negative 
> numbers; thus 1957-10-04T00:00:00Z, 4,472 days before the epoch, is 
> represented by the Unix time number −4,472 × 86,400 = −386380800.
> {quote}
> Allowing for negative timestamps would require multiple changes:
>  - while brokers in general do support negative timestamps, broker use {{-1}} 
> as default value if a producer uses an old message format (this would not be 
> compatible with supporting negative timestamps "end-to-end" as {{-1}} cannot 
> be used as "unknown" anymore): we could introduce a message flag indicating a 
> missing timestamp (and let producer throw an exception if 
> {{ConsumerRecord#timestamp()}} is called. Another possible solution might be, 
> to require topics that are used by old producers to be configured with 
> {{LogAppendTime}} semantics and rejecting writes to topics with 
> {{CreateTime}} semantics for older message formats
>  - {{KafkaProducer}} does not allow to send records with negative timestamp 
> and thus this would need to be fixed
>  - Streams API does drop records with negative timestamps (or fails by 
> default) -- also, some internal store implementation for windowed stores 
> assume that there are not negative timestamps to do range queries
> There might be other gaps we need to address. This is just a summary of issue 
> coming to my mind atm.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6048) Support negative record timestamps

2017-11-28 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269034#comment-16269034
 ] 

Matthias J. Sax commented on KAFKA-6048:


[~james.c] It seems you did loose interest to work on this? At dev list 
[~chuhlomin] announce that he is going to prepare KIP for this. Thus, I am 
re-assigning this ticket to him. Of course, you are very welcome to help 
working on this!

> Support negative record timestamps
> --
>
> Key: KAFKA-6048
> URL: https://issues.apache.org/jira/browse/KAFKA-6048
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core, streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: james chien
>  Labels: needs-kip
>
> Kafka does not support negative record timestamps, and this prevents the 
> storage of historical data in Kafka. In general, negative timestamps are 
> supported by UNIX system time stamps: 
> From https://en.wikipedia.org/wiki/Unix_time
> {quote}
> The Unix time number is zero at the Unix epoch, and increases by exactly 
> 86,400 per day since the epoch. Thus 2004-09-16T00:00:00Z, 12,677 days after 
> the epoch, is represented by the Unix time number 12,677 × 86,400 = 
> 1095292800. This can be extended backwards from the epoch too, using negative 
> numbers; thus 1957-10-04T00:00:00Z, 4,472 days before the epoch, is 
> represented by the Unix time number −4,472 × 86,400 = −386380800.
> {quote}
> Allowing for negative timestamps would require multiple changes:
>  - while brokers in general do support negative timestamps, broker use {{-1}} 
> as default value if a producer uses an old message format (this would not be 
> compatible with supporting negative timestamps "end-to-end" as {{-1}} cannot 
> be used as "unknown" anymore): we could introduce a message flag indicating a 
> missing timestamp (and let producer throw an exception if 
> {{ConsumerRecord#timestamp()}} is called. Another possible solution might be, 
> to require topics that are used by old producers to be configured with 
> {{LogAppendTime}} semantics and rejecting writes to topics with 
> {{CreateTime}} semantics for older message formats
>  - {{KafkaProducer}} does not allow to send records with negative timestamp 
> and thus this would need to be fixed
>  - Streams API does drop records with negative timestamps (or fails by 
> default) -- also, some internal store implementation for windowed stores 
> assume that there are not negative timestamps to do range queries
> There might be other gaps we need to address. This is just a summary of issue 
> coming to my mind atm.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5882) NullPointerException in StreamTask

2017-11-28 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269010#comment-16269010
 ] 

Matthias J. Sax commented on KAFKA-5882:


Hmmm... Can you do a `topologyBuilder.build(null).toString()` and share the 
output? Also DEBUG logs of Streams application would help. Thx.

> NullPointerException in StreamTask
> --
>
> Key: KAFKA-5882
> URL: https://issues.apache.org/jira/browse/KAFKA-5882
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.11.0.0
>Reporter: Seweryn Habdank-Wojewodzki
>
> It seems bugfix [KAFKA-5073|https://issues.apache.org/jira/browse/KAFKA-5073] 
> is made, but introduce some other issue.
> In some cases (I am not sure which ones) I got NPE (below).
> I would expect that even in case of FATAL error anythink except NPE is thrown.
> {code}
> 2017-09-12 23:34:54 ERROR ConsumerCoordinator:269 - User provided listener 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener 
> for group streamer failed on partition assignment
> java.lang.NullPointerException: null
> at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:123)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:1234)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:294)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.retryWithBackoff(StreamThread.java:254)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:1313)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$1100(StreamThread.java:73)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsAssigned(StreamThread.java:183)
>  ~[myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:265)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:363)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1078)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043) 
> [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:582)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553)
>  [myapp-streamer.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)
>  [myapp-streamer.jar:?]
> 2017-09-12 23:34:54 INFO  StreamThread:1040 - stream-thread 
> [streamer-3a44578b-faa8-4b5b-bbeb-7a7f04639563-StreamThread-1] Shutting down
> 2017-09-12 23:34:54 INFO  KafkaProducer:972 - Closing the Kafka producer with 
> timeoutMillis = 9223372036854775807 ms.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6249) Interactive query downtime when node goes down even with standby replicas

2017-11-28 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268997#comment-16268997
 ] 

Matthias J. Sax commented on KAFKA-6249:


Thanks. I understand what you are saying. This is related to another issue 
(basically, your fail-over behaves the same way as scale-out scenario for which 
StandbyTasks don't help): KAFKA-6145

Will close this as duplicate.

> Interactive query downtime when node goes down even with standby replicas
> -
>
> Key: KAFKA-6249
> URL: https://issues.apache.org/jira/browse/KAFKA-6249
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Charles Crain
>
> In a multi-node Kafka Streams application that uses interactive queries, the 
> queryable store will become unavailable (throw InvalidStateStoreException) 
> for up to several minutes when a node goes down.  This happens regardless of 
> how many nodes are in the application as well as how many standby replicas 
> are configured.
> My expectation is that if a standby replica is present, that the interactive 
> query would fail over to the live replica immediately causing negligible 
> downtime for interactive queries.  Instead, what appears to happen is that 
> the queryable store is down for however long it takes for the nodes to 
> completely rebalance (this takes a few minutes for a couple GB of total data 
> in the queryable store's backing topic).
> I am filing this as a bug, realizing that it may in fact be a feature 
> request.  However, until there is a way we can use interactive queries with 
> minimal (~zero) downtime on node failure, we are having to entertain other 
> strategies for serving queries (e.g. manually materializing the topic to an 
> external resilient store such as Cassandra) in order to meet our SLAs.
> If there is a way to minimize the downtime of interactive queries on node 
> failure that I am missing, I would like to know what it is.
> Our team is super-enthusiastic about Kafka Streams and we're keen to use it 
> for just about everything!  This is our only major roadblock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-6249) Interactive query downtime when node goes down even with standby replicas

2017-11-28 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6249.

Resolution: Duplicate

> Interactive query downtime when node goes down even with standby replicas
> -
>
> Key: KAFKA-6249
> URL: https://issues.apache.org/jira/browse/KAFKA-6249
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Charles Crain
>
> In a multi-node Kafka Streams application that uses interactive queries, the 
> queryable store will become unavailable (throw InvalidStateStoreException) 
> for up to several minutes when a node goes down.  This happens regardless of 
> how many nodes are in the application as well as how many standby replicas 
> are configured.
> My expectation is that if a standby replica is present, that the interactive 
> query would fail over to the live replica immediately causing negligible 
> downtime for interactive queries.  Instead, what appears to happen is that 
> the queryable store is down for however long it takes for the nodes to 
> completely rebalance (this takes a few minutes for a couple GB of total data 
> in the queryable store's backing topic).
> I am filing this as a bug, realizing that it may in fact be a feature 
> request.  However, until there is a way we can use interactive queries with 
> minimal (~zero) downtime on node failure, we are having to entertain other 
> strategies for serving queries (e.g. manually materializing the topic to an 
> external resilient store such as Cassandra) in order to meet our SLAs.
> If there is a way to minimize the downtime of interactive queries on node 
> failure that I am missing, I would like to know what it is.
> Our team is super-enthusiastic about Kafka Streams and we're keen to use it 
> for just about everything!  This is our only major roadblock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1489) Global threshold on data retention size

2017-11-28 Thread Jeffrey Zampieron (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268849#comment-16268849
 ] 

Jeffrey Zampieron commented on KAFKA-1489:
--

I'd like to propose thinking about this from a bullet-proof operational 
perspective. 

When I'm running a cluster, about the only thing I care about is not filling up 
the disk and dropping new data... almost everyone wants to use the disk as a 
giant circular buffer. Rotating out the old when it gets full, no matter which 
topic, partition, whatever.

Regardless of if I have unbalanced topics, partitions, none of that matters 
here... it's strictly *per-broker* (containerized or not) that I want to retain 
X amount of bytes of data in my data folder in order to stay running.

> Global threshold on data retention size
> ---
>
> Key: KAFKA-1489
> URL: https://issues.apache.org/jira/browse/KAFKA-1489
> Project: Kafka
>  Issue Type: New Feature
>  Components: log
>Affects Versions: 0.8.1.1
>Reporter: Andras Sereny
>
> Currently, Kafka has per topic settings to control the size of one single log 
> (log.retention.bytes). With lots of topics of different volume and as they 
> grow in number, it could become tedious to maintain topic level settings 
> applying to a single log. 
> Often, a chunk of disk space is dedicated to Kafka that hosts all logs 
> stored, so it'd make sense to have a configurable threshold to control how 
> much space *all* data in one Kafka log data directory can take up.
> See also:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201406.mbox/browser
> http://mail-archives.apache.org/mod_mbox/kafka-users/201311.mbox/%3c20131107015125.gc9...@jkoshy-ld.linkedin.biz%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-6249) Interactive query downtime when node goes down even with standby replicas

2017-11-28 Thread Charles Crain (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268753#comment-16268753
 ] 

Charles Crain commented on KAFKA-6249:
--

It is possible I am seeing the delay due to the new instance.  We use 
Kubernetes as our cluster manager, so if a node crashes, one is immediately 
brought back up to replace it.  So our actual failure scenario is one node 
leaving, then another shortly joining.  During this time, the cluster spends a 
long time in the rebalancing state, during which time some or all partitions of 
data are unavailable even when standby replicas are configured.

If KAFKA-6144 is relating to querying standby replicas, and will also allow 
querying of stale data (i.e. data from replicas that are in the process of 
rebalancing  but have not concluded yet), then yes I think you can close this 
as a duplicate.

Ultimately our desired behavior is: as long as the number of offline nodes 
during a rebalance is never greater than the number of standby replicas, then 
the data should always be present on at least one node, and therefore queries 
should continue to work.

> Interactive query downtime when node goes down even with standby replicas
> -
>
> Key: KAFKA-6249
> URL: https://issues.apache.org/jira/browse/KAFKA-6249
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Charles Crain
>
> In a multi-node Kafka Streams application that uses interactive queries, the 
> queryable store will become unavailable (throw InvalidStateStoreException) 
> for up to several minutes when a node goes down.  This happens regardless of 
> how many nodes are in the application as well as how many standby replicas 
> are configured.
> My expectation is that if a standby replica is present, that the interactive 
> query would fail over to the live replica immediately causing negligible 
> downtime for interactive queries.  Instead, what appears to happen is that 
> the queryable store is down for however long it takes for the nodes to 
> completely rebalance (this takes a few minutes for a couple GB of total data 
> in the queryable store's backing topic).
> I am filing this as a bug, realizing that it may in fact be a feature 
> request.  However, until there is a way we can use interactive queries with 
> minimal (~zero) downtime on node failure, we are having to entertain other 
> strategies for serving queries (e.g. manually materializing the topic to an 
> external resilient store such as Cassandra) in order to meet our SLAs.
> If there is a way to minimize the downtime of interactive queries on node 
> failure that I am missing, I would like to know what it is.
> Our team is super-enthusiastic about Kafka Streams and we're keen to use it 
> for just about everything!  This is our only major roadblock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-1120) Controller could miss a broker state change

2017-11-28 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268731#comment-16268731
 ] 

Edoardo Comar commented on KAFKA-1120:
--

Interestingly using [~wushujames] script in [#comment-16110002] on a 
development laptop running trunk code :
* with the suggested 2x5000 partitions, 2x replicated - the cluster is 
unstable, after resting idle, in a steady state for some 5-10 minutes, one or 
two of the brokers get disconnected from zookeeper, will reconnect and start a 
bounce where one or the other get out of sync
* with lower number of partitions (eg 2500,3500) the above instability doesn't 
show but with either a controlled shudown with short timeout, or a ungraceful 
kill, followed by broker restart get the cluster back in sync without issues



> Controller could miss a broker state change 
> 
>
> Key: KAFKA-1120
> URL: https://issues.apache.org/jira/browse/KAFKA-1120
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Affects Versions: 0.8.1
>Reporter: Jun Rao
>Assignee: Mickael Maison
>  Labels: reliability
> Fix For: 1.1.0
>
>
> When the controller is in the middle of processing a task (e.g., preferred 
> leader election, broker change), it holds a controller lock. During this 
> time, a broker could have de-registered and re-registered itself in ZK. After 
> the controller finishes processing the current task, it will start processing 
> the logic in the broker change listener. However, it will see no broker 
> change and therefore won't do anything to the restarted broker. This broker 
> will be in a weird state since the controller doesn't inform it to become the 
> leader of any partition. Yet, the cached metadata in other brokers could 
> still list that broker as the leader for some partitions. Client requests 
> routed to that broker will then get a TopicOrPartitionNotExistException. This 
> broker will continue to be in this bad state until it's restarted again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)