Build failed in Jenkins: kafka-trunk-jdk8 #1093

2016-12-09 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-4486: Don't commit offsets on exception

--
[...truncated 8012 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow 
STARTED

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow PASSED

kafka.integration.AutoOffsetResetTest > t

[jira] [Created] (KAFKA-4520) Kafka broker fails with not so user-friendly error msg when log.dirs is not set

2016-12-09 Thread Buchi Reddy B (JIRA)
Buchi Reddy B created KAFKA-4520:


 Summary: Kafka broker fails with not so user-friendly error msg 
when log.dirs is not set
 Key: KAFKA-4520
 URL: https://issues.apache.org/jira/browse/KAFKA-4520
 Project: Kafka
  Issue Type: Improvement
Reporter: Buchi Reddy B
Priority: Trivial


I tried to bring up a Kafka broker without setting log.dirs property and it has 
failed with the following error.

{code:java}
[2016-12-07 23:41:08,020] INFO KafkaConfig values:
 advertised.host.name = 100.96.7.10
 advertised.listeners = null
 advertised.port = null
 authorizer.class.name =
 auto.create.topics.enable = true
 auto.leader.rebalance.enable = true
 background.threads = 10
 broker.id = 0
 broker.id.generation.enable = false
 broker.rack = null
 compression.type = producer
 connections.max.idle.ms = 60
 controlled.shutdown.enable = true
 controlled.shutdown.max.retries = 3
 controlled.shutdown.retry.backoff.ms = 5000
 controller.socket.timeout.ms = 3
 default.replication.factor = 1
 delete.topic.enable = false
 fetch.purgatory.purge.interval.requests = 1000
 group.max.session.timeout.ms = 30
 group.min.session.timeout.ms = 6000
 host.name =
 inter.broker.protocol.version = 0.10.1-IV2
 leader.imbalance.check.interval.seconds = 300
 leader.imbalance.per.broker.percentage = 10
 listeners = PLAINTEXT://0.0.0.0:9092
 log.cleaner.backoff.ms = 15000
 log.cleaner.dedupe.buffer.size = 134217728
 log.cleaner.delete.retention.ms = 8640
 log.cleaner.enable = true
 log.cleaner.io.buffer.load.factor = 0.9
 log.cleaner.io.buffer.size = 524288
 log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
 log.cleaner.min.cleanable.ratio = 0.5
 log.cleaner.min.compaction.lag.ms = 0
 log.cleaner.threads = 1
 log.cleanup.policy = [delete]
 log.dir = /tmp/kafka-logs
 log.dirs =
 log.flush.interval.messages = 9223372036854775807
 log.flush.interval.ms = null
 log.flush.offset.checkpoint.interval.ms = 6
 log.flush.scheduler.interval.ms = 9223372036854775807
 log.index.interval.bytes = 4096
 log.index.size.max.bytes = 10485760
 log.message.format.version = 0.10.1-IV2
 log.message.timestamp.difference.max.ms = 9223372036854775807
 log.message.timestamp.type = CreateTime
 log.preallocate = false
 log.retention.bytes = -1
 log.retention.check.interval.ms = 30
 log.retention.hours = 168
 log.retention.minutes = null
 log.retention.ms = null
 log.roll.hours = 168
 log.roll.jitter.hours = 0
 log.roll.jitter.ms = null
 log.roll.ms = null
 log.segment.bytes = 1073741824
 log.segment.delete.delay.ms = 6
 max.connections.per.ip = 2147483647
 max.connections.per.ip.overrides =
 message.max.bytes = 112
 metric.reporters = []
 metrics.num.samples = 2
 metrics.sample.window.ms = 3
 min.insync.replicas = 1
 num.io.threads = 8
 num.network.threads = 3
 num.partitions = 1
 num.recovery.threads.per.data.dir = 1
 num.replica.fetchers = 1
 offset.metadata.max.bytes = 4096
 offsets.commit.required.acks = -1
 offsets.commit.timeout.ms = 5000
 offsets.load.buffer.size = 5242880
 offsets.retention.check.interval.ms = 60
 offsets.retention.minutes = 1440
 offsets.topic.compression.codec = 0
 offsets.topic.num.partitions = 50
 offsets.topic.replication.factor = 3
 offsets.topic.segment.bytes = 104857600
 port = 9092
 principal.builder.class = class 
org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
 producer.purgatory.purge.interval.requests = 1000
 queued.max.requests = 500
 quota.consumer.default = 9223372036854775807
 quota.producer.default = 9223372036854775807
 quota.window.num = 11
 quota.window.size.seconds = 1
 replica.fetch.backoff.ms = 1000
 replica.fetch.max.bytes = 1048576
 replica.fetch.min.bytes = 1
 replica.fetch.response.max.bytes = 10485760
 replica.fetch.wait.max.ms = 500
 replica.high.watermark.checkpoint.interval.ms = 5000
 replica.lag.time.max.ms = 1
 replica.socket.receive.buffer.bytes = 65536
 replica.socket.timeout.ms = 3
 replication.quota.window.num = 11
 replication.quota.window.size.seconds = 1
 request.timeout.ms = 3
 reserved.broker.max.id = 1000
 sasl.enabled.mechanisms = [GSSAPI]
 sasl.kerberos.kinit.cmd = /usr/bin/kinit
 sasl.kerberos.min.time.before.relogin = 6
 sasl.kerberos.principal.to.local.rules = [DEFAULT]
 sasl.kerberos.service.name = null
 sasl.kerberos.ticket.renew.jitter = 0.05
 sasl.kerberos.ticket.renew.window.factor = 0.8
 sasl.mechanism.inter.broker.protocol = GSSAPI
 security.inter.broker

Build failed in Jenkins: kafka-trunk-jdk8 #1092

2016-12-09 Thread Apache Jenkins Server
See 

Changes:

[jason] KAFKA-4375; Reset interrupt state in a few places where

[jason] KAFKA-4272; Add missing 'connect' Windows batch scripts

[jason] KAFKA-4000; Collect and record per-topic consumer metrics

[wangguoz] KAFKA-4393: Improve invalid/negative TS handling

--
[...truncated 8014 lines...]

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithCollision PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokerListWithNoTopics PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testGetAllTopicMetadata 
PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata 
STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.SaslPlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic PASSED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest STARTED

kafka.integration.PrimitiveApiTest > testEmptyFetchRequest PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
STARTED

kafka.integration.PlaintextTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic STARTED

kafka.integration.PlaintextTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata STARTED

kafka.integration.PlaintextTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAutoCreateTopicWithInvalidReplication PASSED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown STARTED

kafka.integration.PlaintextTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.i

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-09 Thread Matthias J. Sax
About using offset-topic metadata field:

Even with KIP-98, I think this would not work (or maybe in a weird way).
If we have the following topology:

topic1 -> subTopologyA -> topic2 -> subTopologyB

If producer of subTopologyA commits, it will commit its input offsets
from topic1. Thus, the stop-offsets of topic2 for subTopologyB would we
committed with the metadata of topic1 commits. But subTopologyB is not
really related to topic1. I guess it would not be impossible to make
this work, however, the design seems to be somewhat weird.

But maybe, I do miss something.

Furthermore, I am not sure if Streams will use transactions all the
same. Will there be an option for the user to disable transactions and
stick with at-least-once processing?

Also, we would need to delay this KIP until KIP-98 and a Streams EOS KIP
is in place... I would rather include this in next release 0.10.2.


-Matthias

On 12/9/16 10:47 AM, Guozhang Wang wrote:
> I will read through the KIP doc once again to provide more detailed
> feedbacks, but let me through my two cents just for the above email.
> 
> There are a few motivations to have a "consistent" stop-point across tasks
> belonging to different sub-topologies. One of them is for interactive
> queries: say you have two state stores belonging to two sub-topologies, if
> they stopped at different points, then when user querying them they will
> see inconsistent answers (think about the example people always use in
> databases: the stores represent A and B's bank account and a record is
> processed to move X dollar from A to B).
> 
> As for the implementation to support such consistent stop-points though, I
> think the metadata field in offset topic does worth exploring, because
> Streams may very likely use the transactional APIs proposed in KIP-98 to
> let producers send offsets in a transactional manner, not the consumers
> themselves (details can be found here
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8
> ).
> 
> 
> 
> Guozhang
> 
> 
> On Tue, Dec 6, 2016 at 2:45 PM, Matthias J. Sax 
> wrote:
> 
>> Thanks for the input Jay.
>>
>> From my understanding, your question boils down to how fuzzy the stop
>> point can/should be, and what guarantees we want to provide to the user.
>> This has multiple dimension:
>>
>>
>> 1. Using a timestamp has the main issue, that if an input topic has no
>> data with this timestamp, the application does never finish (ie, stop
>> timestamp is in "the future").
>>
>> Furthermore, it would require the user to specify the timestamp because
>> if we use current (ie, on startup) system time as stop-timestamp, the
>> case of a "future stop-timestamp" might be very common (think no active
>> producers for a topic -- it's batch processing). Thus, the user needs to
>> know the stop-timestamp, which might be hard for her to figure out -- in
>> the current design it's way simpler for the user to activate "auto stop".
>>
>> Last but not least, assume an application with two subtopologies that
>> are connected via an intermediate topic and both subtopologies are
>> executed in different JVMs. The first topology could filter a lot of
>> messages and thus it might happen, that it never writes a record with
>> timestamp >= stop-timestamp into the intermediate topic. Thus, even if
>> the first JVM terminates the second would not stop automatically as it
>> never reads a record with timestamp >= stop-timestamp.
>>
>> There would be some workaround if we shut down in a "fuzzy way", ie,
>> with no guarantees what record will actually get processed (ie, stop
>> processing somewhat early of some cases). But I will argue in (3) why
>> this "stop roughly about this time semantic" is not a good idea.
>>
>>
>> 2. I was not aware of a metadata field for committed offsets and this
>> sounds quite appealing. However, thinking about it in more detail, I
>> have my doubts we can use it:
>>
>> If we want to propagate stop-offsets for intermediate topics, all
>> producer instances would need to update this metadata field, thus need
>> to commit (A producer that does commit? Well we could use "restore
>> consumer" with manual partition assignment for this.) -- however, this
>> would not only conflict with the commits of the actual consumer, but
>> also in between all running producers.
>>
>>
>> 3. This is the overall "how fuzzy we want to be" discussion. I would
>> argue that we should provide somewhat strong stop consistency. Assume
>> the following use case. An external application generates data in
>> batches and writes files to HDFS. Those files are imported into Kafka
>> via Connect. Each time a batch of data gots inserted into Kafka, this
>> data should be processed with a Streams application. If we cannot
>> guarantee that all data of this batch is fully processed and the result
>> is complete, use experience would be quite bad.
>>
>> Furthermore, we want to guard a running batch job to process "too much"
>> data: Assume the same scenari

[jira] [Resolved] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-09 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-4486.
--
Resolution: Fixed

Issue resolved by pull request 2225
[https://github.com/apache/kafka/pull/2225]

> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
> Fix For: 0.10.2.0
>
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736912#comment-15736912
 ] 

ASF GitHub Bot commented on KAFKA-4486:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2225


> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
> Fix For: 0.10.2.0
>
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2225: KAFKA-4486: Don't commit offsets on exception

2016-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2225


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2239: Docs templates

2016-12-09 Thread derrickdoo
GitHub user derrickdoo opened a pull request:

https://github.com/apache/kafka/pull/2239

Docs templates

- Move Streams documentation out to it's own page
- Render docs with Handlebars.js so we quickly set repeated items like doc 
version numbers

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/derrickdoo/kafka docTemplates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2239


commit 13f6f24c09a04c7e9f30f0809790d5247bfd6a69
Author: Derrick Or 
Date:   2016-12-10T01:12:04Z

get docs setup with handlebars and seperate streams out to its own page




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-09 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736891#comment-15736891
 ] 

James Cheng commented on KAFKA-4477:


[~tcrayford-heroku] told me that he's seen a deadlock bug that also had a 
symptom where number of filehandles starts to go up. Tom, do you think this 
could be related?

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: issue_node_1001.log, issue_node_1002.log, 
> issue_node_1003.log, kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-09 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736810#comment-15736810
 ] 

Vahid Hashemian commented on KAFKA-3853:


Sounds good. The KIP needs one more binding vote to pass (I'm assuming it needs 
3).

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #1091

2016-12-09 Thread Apache Jenkins Server
See 

Changes:

[jason] MINOR: Fix typos in KafkaConsumer docs

--
[...truncated 27781 lines...]

org.apache.kafka.connect.runtime.WorkerSinkTaskTest > 
testWakeupInCommitSyncCausesRetry PASSED

org.apache.kafka.connect.runtime.SourceTaskOffsetCommitterTest > testSchedule 
STARTED

org.apache.kafka.connect.runtime.SourceTaskOffsetCommitterTest > testSchedule 
PASSED

org.apache.kafka.connect.runtime.SourceTaskOffsetCommitterTest > testClose 
STARTED

org.apache.kafka.connect.runtime.SourceTaskOffsetCommitterTest > testClose 
PASSED

org.apache.kafka.connect.runtime.SourceTaskOffsetCommitterTest > testRemove 
STARTED

org.apache.kafka.connect.runtime.SourceTaskOffsetCommitterTest > testRemove 
PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupFailure 
STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupFailure PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testOnResume STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testOnResume PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupPaused STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupPaused PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > 
testTransitionPausedToPaused STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > 
testTransitionPausedToPaused PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testFailureIsFinalState 
STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testFailureIsFinalState 
PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testInitializeFailure 
STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testInitializeFailure 
PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupAndPause 
STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupAndPause 
PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupAndShutdown 
STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testStartupAndShutdown 
PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > 
testTransitionStartedToStarted STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > 
testTransitionStartedToStarted PASSED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testShutdownFailure 
STARTED

org.apache.kafka.connect.runtime.WorkerConnectorTest > testShutdownFailure 
PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testInconsistentConfigs STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testInconsistentConfigs PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalanceFailedConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalanceFailedConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testHaltCleansUpWorker STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testHaltCleansUpWorker PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedBasicValidation STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedBasicValidation PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedCustomValidation STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorFailedCustomValidation PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorNameConflictsWithWorkerGroupId STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testConnectorNameConflictsWithWorkerGroupId PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorAlreadyExists STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testCreateConnectorAlreadyExists PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testDestroyConnector PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRestartConnector STARTED

org.apache.kafka.c

[jira] [Commented] (KAFKA-4393) Improve invalid/negative TS handling

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736779#comment-15736779
 ] 

ASF GitHub Bot commented on KAFKA-4393:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2117


> Improve invalid/negative TS handling
> 
>
> Key: KAFKA-4393
> URL: https://issues.apache.org/jira/browse/KAFKA-4393
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
> Fix For: 0.10.2.0
>
>
> Currently, Kafka Streams does not handle invalid/negative timestamps returned 
> from the {{TimestampExtractor}} gracefully, but fails with an exception, 
> because negative timestamps cannot get handled in a meaningful way for any 
> time based (ie, window) operators like window aggregates and joins.
> We want to change Streams to a auto-drop behavior for negative timestamps for 
> those records (without any further user notification about dropped record) to 
> enable users to "step over" those records and keep going (instead of an 
> exception). To guard the user from silently dropping messages by default (and 
> kept current fail-fast behavior), we change the default extractor 
> {{ConsumerRecordTimestampExtractor}} to check the extracted meta-data record 
> timestamp and raise an exception if it is negative. Furthermore, we add a 
> "drop-and-log" extractor, as this seems to be a common behavior user might 
> want to have. For any other behavior, users can still provide a custom 
> TS-Extractor implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4393) Improve invalid/negative TS handling

2016-12-09 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4393:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2117
[https://github.com/apache/kafka/pull/2117]

> Improve invalid/negative TS handling
> 
>
> Key: KAFKA-4393
> URL: https://issues.apache.org/jira/browse/KAFKA-4393
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
> Fix For: 0.10.2.0
>
>
> Currently, Kafka Streams does not handle invalid/negative timestamps returned 
> from the {{TimestampExtractor}} gracefully, but fails with an exception, 
> because negative timestamps cannot get handled in a meaningful way for any 
> time based (ie, window) operators like window aggregates and joins.
> We want to change Streams to a auto-drop behavior for negative timestamps for 
> those records (without any further user notification about dropped record) to 
> enable users to "step over" those records and keep going (instead of an 
> exception). To guard the user from silently dropping messages by default (and 
> kept current fail-fast behavior), we change the default extractor 
> {{ConsumerRecordTimestampExtractor}} to check the extracted meta-data record 
> timestamp and raise an exception if it is negative. Furthermore, we add a 
> "drop-and-log" extractor, as this seems to be a common behavior user might 
> want to have. For any other behavior, users can still provide a custom 
> TS-Extractor implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2117: KAFKA-4393: Improve invalid/negative TS handling

2016-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2117


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736775#comment-15736775
 ] 

Ismael Juma commented on KAFKA-3853:


Let's try and include it in 0.10.2.0. :)

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3853:
---
Fix Version/s: 0.10.2.0

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-09 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736755#comment-15736755
 ] 

Vahid Hashemian commented on KAFKA-3853:


[~jeffwidman] This is currently waiting for 
[KIP-88|https://cwiki.apache.org/confluence/display/pages/viewpage.action?pageId=66849788]
 which is in voting state. Once the KIP has passed I'll submit the PR for this 
JIRA.

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3853) Report offsets for empty groups in ConsumerGroupCommand

2016-12-09 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736740#comment-15736740
 ] 

Jeff Widman commented on KAFKA-3853:


Any idea if this will land in 0.10.2?

> Report offsets for empty groups in ConsumerGroupCommand
> ---
>
> Key: KAFKA-3853
> URL: https://issues.apache.org/jira/browse/KAFKA-3853
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> We ought to be able to display offsets for groups which either have no active 
> members or which are not using group management. The owner column can be left 
> empty or set to "N/A". If a group is active, I'm not sure it would make sense 
> to report all offsets, in particular when partitions are unassigned, but if 
> it seems problematic to do so, we could enable the behavior with a flag (e.g. 
> --include-unassigned).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4497:
---
Priority: Critical  (was: Major)

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Priority: Critical
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:364)
> at kafka.log.Cleaner$$an

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736646#comment-15736646
 ] 

Ismael Juma commented on KAFKA-4497:


There's no way to disable the time index. [~becket_qin], any ideas?

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>Priority: Critical
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:400)
> at kafka.lo

[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread Robert Schumann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736628#comment-15736628
 ] 

Robert Schumann commented on KAFKA-4497:


Catching the right exception and continuing is a valid point, but I would like 
to understand, how the cleaner writes such a broken timeindex.cleaned file from 
an obviously ok-ish timeindex file, e.g. comparing the real timeindex file:
{noformat}
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c46b0f507bb] timestamp:[Fri Nov 25 17:17:08 CET 2016] 
offset:[1979]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c474b8a07c3] timestamp:[Fri Nov 25 17:17:47 CET 2016] 
offset:[1987]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c48921807cd] timestamp:[Fri Nov 25 17:19:11 CET 2016] 
offset:[1997]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c48db0c07d5] timestamp:[Fri Nov 25 17:19:30 CET 2016] 
offset:[2005]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c49283807de] timestamp:[Fri Nov 25 17:19:49 CET 2016] 
offset:[2014]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c496fcf07e6] timestamp:[Fri Nov 25 17:20:08 CET 2016] 
offset:[2022]
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex] 
raw:[01589c4a47a507ef] timestamp:[Fri Nov 25 17:21:03 CET 2016] 
offset:[2031]
{noformat}

To the one the log cleaner produced:
{noformat}
file:[/var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned]
 raw:[01589c490ae1] timestamp:[Fri Nov 25 17:19:42 CET 2016] 
offset:[4294967295]
{noformat}

Why is it ending up writing a 2^32-1 as offset? This is obviously wrong and 
it's also not written in the original file. Does that mean the .log file is 
somehow unreadable to him? Does it even look at the old timeindex file?

Is it possible to disable the timeindex feature in 0.10.1.0? I'm stressing 
this, because a broken log cleaning for compacted topics is basically a 
production blocker. It fills up disks quickly and deleting something manually 
does not feel very consistency-inspiring.

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> x

[jira] [Updated] (KAFKA-4000) Consumer per-topic metrics do not aggregate partitions from the same topic

2016-12-09 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4000:
---
   Resolution: Fixed
Fix Version/s: 0.10.2.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1684
[https://github.com/apache/kafka/pull/1684]

> Consumer per-topic metrics do not aggregate partitions from the same topic
> --
>
> Key: KAFKA-4000
> URL: https://issues.apache.org/jira/browse/KAFKA-4000
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> In the Consumer Fetcher code, we have per-topic fetch metrics, but they seem 
> to be computed from each partition separately. It seems like we should 
> aggregate them by topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4000) Consumer per-topic metrics do not aggregate partitions from the same topic

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736627#comment-15736627
 ] 

ASF GitHub Bot commented on KAFKA-4000:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1684


> Consumer per-topic metrics do not aggregate partitions from the same topic
> --
>
> Key: KAFKA-4000
> URL: https://issues.apache.org/jira/browse/KAFKA-4000
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>Priority: Minor
> Fix For: 0.10.2.0
>
>
> In the Consumer Fetcher code, we have per-topic fetch metrics, but they seem 
> to be computed from each partition separately. It seems like we should 
> aggregate them by topic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1684: KAFKA-4000: Collect and record per-topic consumer ...

2016-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1684


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Guozhang Wang
Onur,

I understand your question now. So it is indeed possible that after
commitTxn() returned the messages could still be lost permanently if all
replicas failed before the data was flushed to disk. This is the virtue of
Kafka's design to reply on replication (probably in memory) for high
availability, hence async flushing. This scenario already exist today and
KIP-98 did not intend to change this factor in any ways.

Guozhang


On Fri, Dec 9, 2016 at 12:25 PM, Onur Karaman 
wrote:

> In other words, we can see inconsistency when the transaction log reports
> the transaction as COMMITTED while the markers and data corresponding to
> the transaction itself on the user partitions may have been partially lost
> after-the-fact because of kafka's durability guarantees.
>
> On Fri, Dec 9, 2016 at 12:16 PM, Onur Karaman <
> onurkaraman.apa...@gmail.com>
> wrote:
>
> > @Guozhang no I actually meant durability concerns over COMMIT/ABORT
> > markers (and a subset of the user's data produced in the transaction for
> > that matter) getting lost from the delta between the write and flush.
> >
> > KIP-98 relies on replicas writing to logs, so transaction durability is
> > effectively limited by kafka's definition of a "write success" meaning
> > written but not flushed to disk.
> >
> > I mentioned RF=1 not because of availability but actually to highlight a
> > corner-case durability scenario where the single replica participating in
> > the transaction experiences a hard failure after the write but before the
> > flush, causing the transaction to have partial data loss.
> >
> > Is this level of durability okay or do we want stronger guarantees for
> the
> > transaction? Basically what I'm wondering is if KIP-98 necessitates
> kafka'a
> > definition of a "write success" to be extended from "written" to an
> > optional "written and flushed to disk".
> >
> > On Fri, Dec 9, 2016 at 11:54 AM, Michael Pearce 
> > wrote:
> >
> >> Apologies on the spelling.
> >>
> >> *Hi Jay,
> >> 
> >> From: Michael Pearce 
> >> Sent: Friday, December 9, 2016 7:52:25 PM
> >> To: dev@kafka.apache.org
> >> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> >> Messaging
> >>
> >> Hi Jey
> >>
> >> 1) I agree, these should be used to add this in a future kip if ever was
> >> enough of a case. As stated for us I think for these systems we will
> keep
> >> our JMS solutions there.  I think maybe in the docs when this feature is
> >> written up, one should redirect users to alternative options such as jms
> >> brokers, for these use cases.
> >>
> >> 2) I think this kip needs to be mindful and actually own to make sure
> >> things are implemented in a way to make future enchancement easy/or at
> >> least extensible. Having to in future rework things and correct historic
> >> decisions is expensive as already finding.
> >>
> >> Sent using OWA for iPhone
> >> 
> >> From: Jay Kreps 
> >> Sent: Friday, December 9, 2016 7:19:59 PM
> >> To: dev@kafka.apache.org
> >> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> >> Messaging
> >>
> >> Hey Michael,
> >>
> >> Yeah, I don't think you need to go into the details of whatever you guys
> >> have. I think several people in the thread said "let's do XA
> transactions
> >> too!" Obviously in a world where features were free and always worked
> >> perfectly we would! I've probably talked to about 100 people about their
> >> use of XA transactions in different systems and my observation has been
> >> (a)
> >> they are a bit of an operational nightmare, (b) the use cases i've
> >> understood don't actually require full XA transactions they actually
> >> require a much weaker and easier to guarantee property. The result is
> you
> >> pay a big complexity cost for a guarantee much stronger than what you
> >> wanted. My sense is that this opinion is broadly shared by the
> distributed
> >> systems community at large and by Kafka folks in particular.
> >>
> >> I'm a contrarian so I think it is great not to be too swayed by "common
> >> wisdom" though. Five years ago there was a consensus that distributed
> >> transactions were too hard to implement in an operationally sound way,
> >> which i think was not correct, so the bad reputation for cross-system
> >> transactions may be equally wrong!
> >>
> >> To build a compelling case this is wrong I think two things need to be
> >> done:
> >>
> >>1. Build a case that there are a large/important set of use cases
> that
> >>cannot be solved with two independent transactions (as i described),
> >> and
> >>that these use cases are things Kafka should be able to do.
> >>2. Come up with the concrete extensions to the KIP-98 proposal that
> >>would enable an operationally sound implementation for pluggable
> >>multi-system XA.
> >>
> >> -Jay
> >>
> >>
> >>
> >> On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
> >> wrote:
> >>
> >> > Hi

Re: [DISCUSS] Deprecating the old consumers in trunk

2016-12-09 Thread Guozhang Wang
Looks good to me too.

On Fri, Dec 9, 2016 at 1:02 PM, Jason Gustafson  wrote:

> Hey Ismael, that sounds fair to me. I'm +1.
>
> -Jason
>
> On Thu, Dec 8, 2016 at 8:01 AM, Ismael Juma  wrote:
>
> > Thanks Onur and Jason. I filed a JIRA to track this:
> >
> > https://issues.apache.org/jira/browse/KAFKA-4513
> >
> > My take is that this would be good to have and one could argue that we
> > should not remove the old consumers until we have it. However, I think we
> > should still go ahead with the deprecation of the old consumers for the
> > next release. That will make it clear to existing users that, where
> > possible, they should start moving to the new consumer (everything will
> > still work fine).
> >
> > Thoughts?
> >
> > Ismael
> >
> > On Mon, Nov 28, 2016 at 3:07 AM, Jason Gustafson 
> > wrote:
> >
> > > Onur's suggestion or something like it sounds like it could work.
> Suppose
> > > we add some metadata in Zookeeper for consumers which support the
> > embedded
> > > KafkaConsumer. Until all members in the group have declared support,
> the
> > > consumer will continue use Zk for their partition assignments. But once
> > all
> > > members support the embedded consumer, then they will switch to
> receiving
> > > their assignments from the embedded KafkaConsumer. So basically
> upgrading
> > > to the new consumer first requires that you upgrade the old consumer to
> > use
> > > the new consumer's group assignment protocol. Once you've done that,
> then
> > > upgrading to the new consumer becomes straightforward. Does that work?
> > Then
> > > maybe you don't need to propagate any extra information over the
> > rebalance
> > > protocol.
> > >
> > > -Jason
> > >
> > > On Wed, Nov 23, 2016 at 12:35 AM, Onur Karaman <
> > > onurkaraman.apa...@gmail.com
> > > > wrote:
> > >
> > > > Some coworkers may have had issues seeing my earlier post so
> reposting
> > > > under a different email:
> > > >
> > > > So my earlier stated suboptimal migration plans and Joel's idea all
> > > suffer
> > > > from either downtime or dual partition ownership and consumption.
> > > >
> > > > But I think there's a bigger problem: they assume users are willing
> to
> > do
> > > > the full migration immediately. I'm not convinced that this is
> > realistic.
> > > > Some teams may be okay with this (and the earlier stated consequences
> > of
> > > > the existing approaches), but others want to "canary" new code. That
> > is,
> > > > they want to deploy a single instance of the new code to test the
> > waters
> > > > while all the other instances run old code. It's not unreasonable for
> > > this
> > > > to span days. In this world, earlier alternatives would have the
> canary
> > > > under heavy load since it is the sole new consumer in the group and
> it
> > is
> > > > guaranteed to own every partition the group is interested in. So the
> > > canary
> > > > is likely going to look unhealthy and the consumer can fall behind.
> > > >
> > > > Here's a not-fully-thought-out idea:
> > > > Suppose we roll out a ZookeeperConsumerConnector that uses an
> embedded
> > > > KafkaConsumer to passively participate in kafka-based coordination
> > while
> > > > still participating in zookeeper-based coordination. For now, the
> > > > ZookeeperConsumerConnectors just uses the partition assignment as
> > decided
> > > > in zookeeper. Now suppose an outside KafkaConsumer comes up.
> > Kafka-based
> > > > coordination allows arbitrary metadata to get broadcasted to the
> group.
> > > > Maybe we can somehow broadcast a flag saying a new consumer is
> running
> > > > during this migration. If the KafkaConsumers embedded in the
> > > > ZookeeperConsumerConnector see this flag, then they can notify the
> > > > ZookeeperConsumerConnector's fetchers to fetch the partitions defined
> > by
> > > > the kafka-based coordination rebalance result. The
> > > > ZookeeperConsumerConnector's embedded KafkaConsumer's fetchers never
> > get
> > > > used at any point in time.
> > > >
> > > > The benefits of this approach would be:
> > > > 1. no downtime
> > > > 2. minimal window of dual partition ownership
> > > > 3. even partition distribution upon canary arrival.
> > > > ZookeeperConsumerConnector instances can claim some partition
> > ownership,
> > > so
> > > > the canaried KafkaConsumer doesn't get overwhelmed by all of the
> > > > partitions.
> > > >
> > > > On Fri, Nov 18, 2016 at 12:54 PM, Onur Karaman <
> > > > okara...@linkedin.com.invalid> wrote:
> > > >
> > > > > So my earlier stated suboptimal migration plans and Joel's idea all
> > > > suffer
> > > > > from either downtime or dual partition ownership and consumption.
> > > > >
> > > > > But I think there's a bigger problem: they assume users are willing
> > to
> > > do
> > > > > the full migration immediately. I'm not convinced that this is
> > > realistic.
> > > > > Some teams may be okay with this (and the earlier stated
> consequences
> > > of
> > > > > the existing approaches), but others want to "c

Re: [DISCUSS] KIP-81: Max in-flight fetches

2016-12-09 Thread Jason Gustafson
Hi Mickael,

I think the approach looks good, just a few minor questions:

1. The KIP doesn't say what the default value of `buffer.memory` will be.
Looks like we use 50MB as the default for `fetch.max.bytes`, so perhaps it
makes sense to set the default based on that. Might also be worth
mentioning somewhere the constraint between the two configs.
2. To clarify, this limit only affects the uncompressed size of the fetched
data, right? The consumer may still exceed it in order to store the
decompressed record data. We delay decompression until the records are
returned to the user, but because of max.poll.records, we may end up
holding onto the decompressed data from a single partition for a few
iterations. I think this is fine, but probably worth noting in the KIP.
3. Is there any risk using the MemoryPool that, after we fill up the memory
with fetch data, we can starve the coordinator's connection? Suppose, for
example, that we send a bunch of pre-fetches right before returning to the
user. These fetches might return before the next call to poll(), in which
case we might not have enough memory to receive heartbeats, which would
block us from sending additional heartbeats until the next call to poll().
Not sure it's a big problem since heartbeats are tiny, but might be worth
thinking about.

Thanks,
Jason


On Fri, Dec 2, 2016 at 4:31 AM, Mickael Maison 
wrote:

> It's been a few days since the last comments. KIP-72 vote seems to
> have passed so if I don't get any new comments I'll start the vote on
> Monday.
> Thanks
>
> On Mon, Nov 14, 2016 at 6:25 PM, radai  wrote:
> > +1 - there's is a need for an effective way to control kafka memory
> > consumption - both on the broker and on clients.
> > i think we could even reuse the exact same param name -
> *queued.max.bytes *-
> > as it would serve the exact same purpose.
> >
> > also (and again its the same across the broker and clients) this bound
> > should also cover decompression, at some point.
> > the problem with that is that to the best of my knowledge the current
> wire
> > protocol does not declare the final, uncompressed size of anything up
> front
> > - all we know is the size of the compressed buffer. this may require a
> > format change in the future to properly support?
> >
> > On Mon, Nov 14, 2016 at 10:03 AM, Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> >
> >> Thanks for all the replies.
> >>
> >> I've updated the KIP:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 81%3A+Bound+Fetch+memory+usage+in+the+consumer
> >> The main point is to selectively read from sockets instead of
> >> throttling FetchRequests sends. I also mentioned it will be reusing
> >> the MemoryPool implementation created in KIP-72 instead of adding
> >> another memory tracking method.
> >>
> >> Please have another look. As always, comments are welcome !
> >>
> >> On Thu, Nov 10, 2016 at 2:47 AM, radai 
> wrote:
> >> > selectively reading from sockets achieves memory control (up to and
> not
> >> > including talk of (de)compression)
> >> >
> >> > this is exactly what i (also, even mostly) did for kip-72 - which i
> hope
> >> in
> >> > itself should be a reason to think about both KIPs at the same time
> >> because
> >> > the changes will be similar (at least in intent) and might result in
> >> > duplicated effort.
> >> >
> >> > a pool API is a way to "scale" all the way from just maintaining a
> >> variable
> >> > holding amount of available memory (which is what my current kip-72
> code
> >> > does and what this kip proposes IIUC) all the way up to actually
> re-using
> >> > buffers without any changes to the code using the pool - just drop in
> a
> >> > different pool impl.
> >> >
> >> > for "edge nodes" (producer/consumer) the performance gain in actually
> >> > pooling large buffers may be arguable, but i suspect for brokers
> >> regularly
> >> > operating on 1MB-sized requests (which is the norm at linkedin) the
> >> > resulting memory fragmentation is an actual bottleneck (i have initial
> >> > micro-benchmark results to back this up but have not had the time to
> do a
> >> > full profiling run).
> >> >
> >> > so basically I'm saying we may be doing (very) similar things in
> mostly
> >> the
> >> > same areas of code.
> >> >
> >> > On Wed, Nov 2, 2016 at 11:35 AM, Mickael Maison <
> >> mickael.mai...@gmail.com>
> >> > wrote:
> >> >
> >> >> electively reading from the socket should enable to
> >> >> control the memory usage without impacting performance. I've had look
> >> >> at that today and I can see how that would work.
> >> >> I'll update the KIP accordingly tomorrow.
> >> >>
> >>
>


[GitHub] kafka pull request #2238: MINOR: Sync up 'kafka-run-class.bat' with 'kafka-r...

2016-12-09 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/2238

MINOR: Sync up 'kafka-run-class.bat' with 'kafka-run-class.sh'

Some of the recent changes to `kafka-run-class.sh` have not been applied to 
`kafka-run-class.bat`.
These recent changes include setting proper streams or connect classpaths. 
So any streams or connect use case that leverages `kafka-run-class.bat` would 
fail with an error like
```
Error: Could not find or load main class org.apache.kafka.streams.???
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka 
minor/sync_up_kafka-run-class.bat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2238


commit b6304c52f9b6a569ea91d9017b4fcc55ea6297d0
Author: Vahid Hashemian 
Date:   2016-12-09T21:49:15Z

MINOR: Sync up 'kafka-run-class.bat' with 'kafka-run-class.sh'

Some of the recent changes to `kafka-run-clas.sh` have not been applied to 
`kafka-run-class.bat`.
Some of these recent changes include setting proper streams or connect 
classpaths.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-12-09 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reassigned KAFKA-3209:
--

Assignee: Shikhar Bhushan

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.2.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Deprecating the old consumers in trunk

2016-12-09 Thread Jason Gustafson
Hey Ismael, that sounds fair to me. I'm +1.

-Jason

On Thu, Dec 8, 2016 at 8:01 AM, Ismael Juma  wrote:

> Thanks Onur and Jason. I filed a JIRA to track this:
>
> https://issues.apache.org/jira/browse/KAFKA-4513
>
> My take is that this would be good to have and one could argue that we
> should not remove the old consumers until we have it. However, I think we
> should still go ahead with the deprecation of the old consumers for the
> next release. That will make it clear to existing users that, where
> possible, they should start moving to the new consumer (everything will
> still work fine).
>
> Thoughts?
>
> Ismael
>
> On Mon, Nov 28, 2016 at 3:07 AM, Jason Gustafson 
> wrote:
>
> > Onur's suggestion or something like it sounds like it could work. Suppose
> > we add some metadata in Zookeeper for consumers which support the
> embedded
> > KafkaConsumer. Until all members in the group have declared support, the
> > consumer will continue use Zk for their partition assignments. But once
> all
> > members support the embedded consumer, then they will switch to receiving
> > their assignments from the embedded KafkaConsumer. So basically upgrading
> > to the new consumer first requires that you upgrade the old consumer to
> use
> > the new consumer's group assignment protocol. Once you've done that, then
> > upgrading to the new consumer becomes straightforward. Does that work?
> Then
> > maybe you don't need to propagate any extra information over the
> rebalance
> > protocol.
> >
> > -Jason
> >
> > On Wed, Nov 23, 2016 at 12:35 AM, Onur Karaman <
> > onurkaraman.apa...@gmail.com
> > > wrote:
> >
> > > Some coworkers may have had issues seeing my earlier post so reposting
> > > under a different email:
> > >
> > > So my earlier stated suboptimal migration plans and Joel's idea all
> > suffer
> > > from either downtime or dual partition ownership and consumption.
> > >
> > > But I think there's a bigger problem: they assume users are willing to
> do
> > > the full migration immediately. I'm not convinced that this is
> realistic.
> > > Some teams may be okay with this (and the earlier stated consequences
> of
> > > the existing approaches), but others want to "canary" new code. That
> is,
> > > they want to deploy a single instance of the new code to test the
> waters
> > > while all the other instances run old code. It's not unreasonable for
> > this
> > > to span days. In this world, earlier alternatives would have the canary
> > > under heavy load since it is the sole new consumer in the group and it
> is
> > > guaranteed to own every partition the group is interested in. So the
> > canary
> > > is likely going to look unhealthy and the consumer can fall behind.
> > >
> > > Here's a not-fully-thought-out idea:
> > > Suppose we roll out a ZookeeperConsumerConnector that uses an embedded
> > > KafkaConsumer to passively participate in kafka-based coordination
> while
> > > still participating in zookeeper-based coordination. For now, the
> > > ZookeeperConsumerConnectors just uses the partition assignment as
> decided
> > > in zookeeper. Now suppose an outside KafkaConsumer comes up.
> Kafka-based
> > > coordination allows arbitrary metadata to get broadcasted to the group.
> > > Maybe we can somehow broadcast a flag saying a new consumer is running
> > > during this migration. If the KafkaConsumers embedded in the
> > > ZookeeperConsumerConnector see this flag, then they can notify the
> > > ZookeeperConsumerConnector's fetchers to fetch the partitions defined
> by
> > > the kafka-based coordination rebalance result. The
> > > ZookeeperConsumerConnector's embedded KafkaConsumer's fetchers never
> get
> > > used at any point in time.
> > >
> > > The benefits of this approach would be:
> > > 1. no downtime
> > > 2. minimal window of dual partition ownership
> > > 3. even partition distribution upon canary arrival.
> > > ZookeeperConsumerConnector instances can claim some partition
> ownership,
> > so
> > > the canaried KafkaConsumer doesn't get overwhelmed by all of the
> > > partitions.
> > >
> > > On Fri, Nov 18, 2016 at 12:54 PM, Onur Karaman <
> > > okara...@linkedin.com.invalid> wrote:
> > >
> > > > So my earlier stated suboptimal migration plans and Joel's idea all
> > > suffer
> > > > from either downtime or dual partition ownership and consumption.
> > > >
> > > > But I think there's a bigger problem: they assume users are willing
> to
> > do
> > > > the full migration immediately. I'm not convinced that this is
> > realistic.
> > > > Some teams may be okay with this (and the earlier stated consequences
> > of
> > > > the existing approaches), but others want to "canary" new code. That
> > is,
> > > > they want to deploy a single instance of the new code to test the
> > waters
> > > > while all the other instances run old code. It's not unreasonable for
> > > this
> > > > to span days. In this world, earlier alternatives would have the
> canary
> > > > under heavy load since it is the sole new con

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Onur Karaman
In other words, we can see inconsistency when the transaction log reports
the transaction as COMMITTED while the markers and data corresponding to
the transaction itself on the user partitions may have been partially lost
after-the-fact because of kafka's durability guarantees.

On Fri, Dec 9, 2016 at 12:16 PM, Onur Karaman 
wrote:

> @Guozhang no I actually meant durability concerns over COMMIT/ABORT
> markers (and a subset of the user's data produced in the transaction for
> that matter) getting lost from the delta between the write and flush.
>
> KIP-98 relies on replicas writing to logs, so transaction durability is
> effectively limited by kafka's definition of a "write success" meaning
> written but not flushed to disk.
>
> I mentioned RF=1 not because of availability but actually to highlight a
> corner-case durability scenario where the single replica participating in
> the transaction experiences a hard failure after the write but before the
> flush, causing the transaction to have partial data loss.
>
> Is this level of durability okay or do we want stronger guarantees for the
> transaction? Basically what I'm wondering is if KIP-98 necessitates kafka'a
> definition of a "write success" to be extended from "written" to an
> optional "written and flushed to disk".
>
> On Fri, Dec 9, 2016 at 11:54 AM, Michael Pearce 
> wrote:
>
>> Apologies on the spelling.
>>
>> *Hi Jay,
>> 
>> From: Michael Pearce 
>> Sent: Friday, December 9, 2016 7:52:25 PM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
>> Messaging
>>
>> Hi Jey
>>
>> 1) I agree, these should be used to add this in a future kip if ever was
>> enough of a case. As stated for us I think for these systems we will keep
>> our JMS solutions there.  I think maybe in the docs when this feature is
>> written up, one should redirect users to alternative options such as jms
>> brokers, for these use cases.
>>
>> 2) I think this kip needs to be mindful and actually own to make sure
>> things are implemented in a way to make future enchancement easy/or at
>> least extensible. Having to in future rework things and correct historic
>> decisions is expensive as already finding.
>>
>> Sent using OWA for iPhone
>> 
>> From: Jay Kreps 
>> Sent: Friday, December 9, 2016 7:19:59 PM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
>> Messaging
>>
>> Hey Michael,
>>
>> Yeah, I don't think you need to go into the details of whatever you guys
>> have. I think several people in the thread said "let's do XA transactions
>> too!" Obviously in a world where features were free and always worked
>> perfectly we would! I've probably talked to about 100 people about their
>> use of XA transactions in different systems and my observation has been
>> (a)
>> they are a bit of an operational nightmare, (b) the use cases i've
>> understood don't actually require full XA transactions they actually
>> require a much weaker and easier to guarantee property. The result is you
>> pay a big complexity cost for a guarantee much stronger than what you
>> wanted. My sense is that this opinion is broadly shared by the distributed
>> systems community at large and by Kafka folks in particular.
>>
>> I'm a contrarian so I think it is great not to be too swayed by "common
>> wisdom" though. Five years ago there was a consensus that distributed
>> transactions were too hard to implement in an operationally sound way,
>> which i think was not correct, so the bad reputation for cross-system
>> transactions may be equally wrong!
>>
>> To build a compelling case this is wrong I think two things need to be
>> done:
>>
>>1. Build a case that there are a large/important set of use cases that
>>cannot be solved with two independent transactions (as i described),
>> and
>>that these use cases are things Kafka should be able to do.
>>2. Come up with the concrete extensions to the KIP-98 proposal that
>>would enable an operationally sound implementation for pluggable
>>multi-system XA.
>>
>> -Jay
>>
>>
>>
>> On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
>> wrote:
>>
>> > Hi Jay,
>> >
>> > I can't go too deep into exact implantation due to no NDA. So apologies
>> > here.
>> >
>> > Essentially we have multiple processes each owning selection of accounts
>> > so on general flows an action for an account just needs to be managed
>> local
>> > to the owning node, happy days ever change is handled as a tick tock
>> change.
>> >
>> > Unfortunately when a transfer occurs we need the two processes to
>> > co-ordinate their transaction, we also need to ensure both don't
>> continue
>> > other actions/changesl, we do this using a data grid technology. This
>> grid
>> > technology supports transaction manager that we couple into currently
>> our
>> > jms provider which supports full XA transactions as such

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Onur Karaman
@Guozhang no I actually meant durability concerns over COMMIT/ABORT markers
(and a subset of the user's data produced in the transaction for that
matter) getting lost from the delta between the write and flush.

KIP-98 relies on replicas writing to logs, so transaction durability is
effectively limited by kafka's definition of a "write success" meaning
written but not flushed to disk.

I mentioned RF=1 not because of availability but actually to highlight a
corner-case durability scenario where the single replica participating in
the transaction experiences a hard failure after the write but before the
flush, causing the transaction to have partial data loss.

Is this level of durability okay or do we want stronger guarantees for the
transaction? Basically what I'm wondering is if KIP-98 necessitates kafka'a
definition of a "write success" to be extended from "written" to an
optional "written and flushed to disk".

On Fri, Dec 9, 2016 at 11:54 AM, Michael Pearce 
wrote:

> Apologies on the spelling.
>
> *Hi Jay,
> 
> From: Michael Pearce 
> Sent: Friday, December 9, 2016 7:52:25 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hi Jey
>
> 1) I agree, these should be used to add this in a future kip if ever was
> enough of a case. As stated for us I think for these systems we will keep
> our JMS solutions there.  I think maybe in the docs when this feature is
> written up, one should redirect users to alternative options such as jms
> brokers, for these use cases.
>
> 2) I think this kip needs to be mindful and actually own to make sure
> things are implemented in a way to make future enchancement easy/or at
> least extensible. Having to in future rework things and correct historic
> decisions is expensive as already finding.
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Friday, December 9, 2016 7:19:59 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Michael,
>
> Yeah, I don't think you need to go into the details of whatever you guys
> have. I think several people in the thread said "let's do XA transactions
> too!" Obviously in a world where features were free and always worked
> perfectly we would! I've probably talked to about 100 people about their
> use of XA transactions in different systems and my observation has been (a)
> they are a bit of an operational nightmare, (b) the use cases i've
> understood don't actually require full XA transactions they actually
> require a much weaker and easier to guarantee property. The result is you
> pay a big complexity cost for a guarantee much stronger than what you
> wanted. My sense is that this opinion is broadly shared by the distributed
> systems community at large and by Kafka folks in particular.
>
> I'm a contrarian so I think it is great not to be too swayed by "common
> wisdom" though. Five years ago there was a consensus that distributed
> transactions were too hard to implement in an operationally sound way,
> which i think was not correct, so the bad reputation for cross-system
> transactions may be equally wrong!
>
> To build a compelling case this is wrong I think two things need to be
> done:
>
>1. Build a case that there are a large/important set of use cases that
>cannot be solved with two independent transactions (as i described), and
>that these use cases are things Kafka should be able to do.
>2. Come up with the concrete extensions to the KIP-98 proposal that
>would enable an operationally sound implementation for pluggable
>multi-system XA.
>
> -Jay
>
>
>
> On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
> wrote:
>
> > Hi Jay,
> >
> > I can't go too deep into exact implantation due to no NDA. So apologies
> > here.
> >
> > Essentially we have multiple processes each owning selection of accounts
> > so on general flows an action for an account just needs to be managed
> local
> > to the owning node, happy days ever change is handled as a tick tock
> change.
> >
> > Unfortunately when a transfer occurs we need the two processes to
> > co-ordinate their transaction, we also need to ensure both don't continue
> > other actions/changesl, we do this using a data grid technology. This
> grid
> > technology supports transaction manager that we couple into currently our
> > jms provider which supports full XA transactions as such we can manage
> the
> > production of the change messages out the system transactionally as well
> as
> > the in grid state.
> >
> > The obvious arguement here is should we even look to move this flow off
> > JMS then. We prob shouldn't nor will do this.
> >
> > The point is that I think saying Kafka supports transactions but then not
> > supporting it as per the traditional sense leads to developers expecting
> > similar behaviour and will cause issues in prod wh

[GitHub] kafka pull request #2237: support scala 2.12 build

2016-12-09 Thread pjfanning
GitHub user pjfanning opened a pull request:

https://github.com/apache/kafka/pull/2237

support scala 2.12 build

I have a scenario where we have a 0.8 Kafka Broker managed by a different 
org but I would like to upgrade my application to Scala 2.12.
I know the 0.8 branch is no longer maintained and all I am looking for is a 
one-off publish of a scala 2.12 compatible jar from the head of the 0.8 branch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pjfanning/kafka 0.8.2-scala-2.12

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2237


commit ab442976ea775e1aa32e0a052f283de69922260e
Author: pj.fanning 
Date:   2016-12-09T20:12:21Z

support scala 2.12 build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk7 #1742

2016-12-09 Thread Apache Jenkins Server
See 



Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Michael Pearce
Apologies on the spelling.

*Hi Jay,

From: Michael Pearce 
Sent: Friday, December 9, 2016 7:52:25 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

Hi Jey

1) I agree, these should be used to add this in a future kip if ever was enough 
of a case. As stated for us I think for these systems we will keep our JMS 
solutions there.  I think maybe in the docs when this feature is written up, 
one should redirect users to alternative options such as jms brokers, for these 
use cases.

2) I think this kip needs to be mindful and actually own to make sure things 
are implemented in a way to make future enchancement easy/or at least 
extensible. Having to in future rework things and correct historic decisions is 
expensive as already finding.

Sent using OWA for iPhone

From: Jay Kreps 
Sent: Friday, December 9, 2016 7:19:59 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

Hey Michael,

Yeah, I don't think you need to go into the details of whatever you guys
have. I think several people in the thread said "let's do XA transactions
too!" Obviously in a world where features were free and always worked
perfectly we would! I've probably talked to about 100 people about their
use of XA transactions in different systems and my observation has been (a)
they are a bit of an operational nightmare, (b) the use cases i've
understood don't actually require full XA transactions they actually
require a much weaker and easier to guarantee property. The result is you
pay a big complexity cost for a guarantee much stronger than what you
wanted. My sense is that this opinion is broadly shared by the distributed
systems community at large and by Kafka folks in particular.

I'm a contrarian so I think it is great not to be too swayed by "common
wisdom" though. Five years ago there was a consensus that distributed
transactions were too hard to implement in an operationally sound way,
which i think was not correct, so the bad reputation for cross-system
transactions may be equally wrong!

To build a compelling case this is wrong I think two things need to be done:

   1. Build a case that there are a large/important set of use cases that
   cannot be solved with two independent transactions (as i described), and
   that these use cases are things Kafka should be able to do.
   2. Come up with the concrete extensions to the KIP-98 proposal that
   would enable an operationally sound implementation for pluggable
   multi-system XA.

-Jay



On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
wrote:

> Hi Jay,
>
> I can't go too deep into exact implantation due to no NDA. So apologies
> here.
>
> Essentially we have multiple processes each owning selection of accounts
> so on general flows an action for an account just needs to be managed local
> to the owning node, happy days ever change is handled as a tick tock change.
>
> Unfortunately when a transfer occurs we need the two processes to
> co-ordinate their transaction, we also need to ensure both don't continue
> other actions/changesl, we do this using a data grid technology. This grid
> technology supports transaction manager that we couple into currently our
> jms provider which supports full XA transactions as such we can manage the
> production of the change messages out the system transactionally as well as
> the in grid state.
>
> The obvious arguement here is should we even look to move this flow off
> JMS then. We prob shouldn't nor will do this.
>
> The point is that I think saying Kafka supports transactions but then not
> supporting it as per the traditional sense leads to developers expecting
> similar behaviour and will cause issues in prod when they find it doesn't
> work as they're used to.
>
> As my other response earlier, is there a better name to describe this
> feature, if we're not implementing transactions to the traditional
> transaction expected, to avoid this confusion?
>
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Friday, December 9, 2016 6:08:07 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Michael,
>
> Doesn't that example have more to do with applying the update against two
> rows in a single transaction? That is, clearly the write to Kafka needs to
> be "transactional" and the write to the destination needs to be
> transactional, but it's not clear to me that you need isolation that spans
> both operations. Can you dive into the system architecture a bit more and
> explain why Kafka needs to participate in the same transaction as the
> destination system?
>
> -Jay
>
> On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
> wrote:
>
> > Usecase in IG:
> >
> > Fund transfer between accounts. When we debit one account and fund
> another
>

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Michael Pearce
Hi Jey

1) I agree, these should be used to add this in a future kip if ever was enough 
of a case. As stated for us I think for these systems we will keep our JMS 
solutions there.  I think maybe in the docs when this feature is written up, 
one should redirect users to alternative options such as jms brokers, for these 
use cases.

2) I think this kip needs to be mindful and actually own to make sure things 
are implemented in a way to make future enchancement easy/or at least 
extensible. Having to in future rework things and correct historic decisions is 
expensive as already finding.

Sent using OWA for iPhone

From: Jay Kreps 
Sent: Friday, December 9, 2016 7:19:59 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

Hey Michael,

Yeah, I don't think you need to go into the details of whatever you guys
have. I think several people in the thread said "let's do XA transactions
too!" Obviously in a world where features were free and always worked
perfectly we would! I've probably talked to about 100 people about their
use of XA transactions in different systems and my observation has been (a)
they are a bit of an operational nightmare, (b) the use cases i've
understood don't actually require full XA transactions they actually
require a much weaker and easier to guarantee property. The result is you
pay a big complexity cost for a guarantee much stronger than what you
wanted. My sense is that this opinion is broadly shared by the distributed
systems community at large and by Kafka folks in particular.

I'm a contrarian so I think it is great not to be too swayed by "common
wisdom" though. Five years ago there was a consensus that distributed
transactions were too hard to implement in an operationally sound way,
which i think was not correct, so the bad reputation for cross-system
transactions may be equally wrong!

To build a compelling case this is wrong I think two things need to be done:

   1. Build a case that there are a large/important set of use cases that
   cannot be solved with two independent transactions (as i described), and
   that these use cases are things Kafka should be able to do.
   2. Come up with the concrete extensions to the KIP-98 proposal that
   would enable an operationally sound implementation for pluggable
   multi-system XA.

-Jay



On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
wrote:

> Hi Jay,
>
> I can't go too deep into exact implantation due to no NDA. So apologies
> here.
>
> Essentially we have multiple processes each owning selection of accounts
> so on general flows an action for an account just needs to be managed local
> to the owning node, happy days ever change is handled as a tick tock change.
>
> Unfortunately when a transfer occurs we need the two processes to
> co-ordinate their transaction, we also need to ensure both don't continue
> other actions/changesl, we do this using a data grid technology. This grid
> technology supports transaction manager that we couple into currently our
> jms provider which supports full XA transactions as such we can manage the
> production of the change messages out the system transactionally as well as
> the in grid state.
>
> The obvious arguement here is should we even look to move this flow off
> JMS then. We prob shouldn't nor will do this.
>
> The point is that I think saying Kafka supports transactions but then not
> supporting it as per the traditional sense leads to developers expecting
> similar behaviour and will cause issues in prod when they find it doesn't
> work as they're used to.
>
> As my other response earlier, is there a better name to describe this
> feature, if we're not implementing transactions to the traditional
> transaction expected, to avoid this confusion?
>
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Friday, December 9, 2016 6:08:07 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Michael,
>
> Doesn't that example have more to do with applying the update against two
> rows in a single transaction? That is, clearly the write to Kafka needs to
> be "transactional" and the write to the destination needs to be
> transactional, but it's not clear to me that you need isolation that spans
> both operations. Can you dive into the system architecture a bit more and
> explain why Kafka needs to participate in the same transaction as the
> destination system?
>
> -Jay
>
> On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
> wrote:
>
> > Usecase in IG:
> >
> > Fund transfer between accounts. When we debit one account and fund
> another
> > we must ensure the records to both occur as an acid action, and as a
> single
> > transaction.
> >
> > Today we achieve this because we have jms, as such we can do the actions
> > needed in an xa transaction across both the accounts. To move this

[jira] [Commented] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736139#comment-15736139
 ] 

Ismael Juma commented on KAFKA-4517:


Since we didn't remove it in 0.10.0.0, we now have to wait until 0.11.0.0.

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Minor
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4519) Delete old unused branches in git repo

2016-12-09 Thread Jeff Widman (JIRA)
Jeff Widman created KAFKA-4519:
--

 Summary: Delete old unused branches in git repo
 Key: KAFKA-4519
 URL: https://issues.apache.org/jira/browse/KAFKA-4519
 Project: Kafka
  Issue Type: Task
Reporter: Jeff Widman
Priority: Trivial


Delete these old git branches, as they're quite outdated and not relevant for 
various version branches:
* consumer_redesign
* transactional_messaging
* 0.8.0-beta1-candidate1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736132#comment-15736132
 ] 

ASF GitHub Bot commented on KAFKA-4517:
---

GitHub user jeffwidman opened a pull request:

https://github.com/apache/kafka/pull/2236

KAFKA-4517: Remove deprecated shell script

Remove the kafka-consumer-offset-checker.sh script completely since it
was already deprecated in Kafka9

Currently it's quite confusing to new Kafka operators that this script
exists because it seems to do exactly what they want for checking
offsets, only to later realize they should instead use
kafka-consumer-groups.sh script

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeffwidman/kafka KAFKA-4517

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2236


commit c54fa291a4efcdfedf41bfbb2ba03a0ab7338c9b
Author: Jeff Widman 
Date:   2016-12-09T19:38:33Z

KAFKA-4517: Remove deprecated shell script

Remove the kafka-consumer-offset-checker.sh script completely since it
was already deprecated in Kafka9

Currently it's quite confusing to new Kafka operators that this script
exists because it seems to do exactly what they want for checking
offsets, only to later realize they should instead use
kafka-consumer-groups.sh script




> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Minor
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2236: KAFKA-4517: Remove deprecated shell script

2016-12-09 Thread jeffwidman
GitHub user jeffwidman opened a pull request:

https://github.com/apache/kafka/pull/2236

KAFKA-4517: Remove deprecated shell script

Remove the kafka-consumer-offset-checker.sh script completely since it
was already deprecated in Kafka9

Currently it's quite confusing to new Kafka operators that this script
exists because it seems to do exactly what they want for checking
offsets, only to later realize they should instead use
kafka-consumer-groups.sh script

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeffwidman/kafka KAFKA-4517

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2236


commit c54fa291a4efcdfedf41bfbb2ba03a0ab7338c9b
Author: Jeff Widman 
Date:   2016-12-09T19:38:33Z

KAFKA-4517: Remove deprecated shell script

Remove the kafka-consumer-offset-checker.sh script completely since it
was already deprecated in Kafka9

Currently it's quite confusing to new Kafka operators that this script
exists because it seems to do exactly what they want for checking
offsets, only to later realize they should instead use
kafka-consumer-groups.sh script




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4272) Kafka Connect batch scripts are missing under `bin/windows/`

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736123#comment-15736123
 ] 

ASF GitHub Bot commented on KAFKA-4272:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2146


> Kafka Connect batch scripts are missing under `bin/windows/`
> 
>
> Key: KAFKA-4272
> URL: https://issues.apache.org/jira/browse/KAFKA-4272
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> There are no {{connect-distributed.bat}} and {{connect-standalone.bat}} 
> scripts under {{bin/windows/}} folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2146: KAFKA-4272: Add missing 'connect' Windows batch sc...

2016-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2146


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4272) Kafka Connect batch scripts are missing under `bin/windows/`

2016-12-09 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4272:
---
   Resolution: Fixed
Fix Version/s: 0.10.2.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2146
[https://github.com/apache/kafka/pull/2146]

> Kafka Connect batch scripts are missing under `bin/windows/`
> 
>
> Key: KAFKA-4272
> URL: https://issues.apache.org/jira/browse/KAFKA-4272
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
> Fix For: 0.10.2.0
>
>
> There are no {{connect-distributed.bat}} and {{connect-standalone.bat}} 
> scripts under {{bin/windows/}} folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4518) Kafka-connect Not starting after DB restarts

2016-12-09 Thread Akshath Patkar (JIRA)
Akshath Patkar created KAFKA-4518:
-

 Summary: Kafka-connect Not starting after DB restarts
 Key: KAFKA-4518
 URL: https://issues.apache.org/jira/browse/KAFKA-4518
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Akshath Patkar
Assignee: Ewen Cheslack-Postava


We use confluent3.0 package+Debezium for steaming mysql data into Kafka.

How ever we had some minor issues with mysql and had to restart mysql servers. 
After the restart of mysql server, KC stops(this is expected, its there on DBZ 
docs).

We restarted KC, later kaka-connect starts without any error, But it will not 
stream the data into Kafka.

We can see following like in logs and connect Just sits there, without 
committing any thing to kafka.

INFO Source task WorkerSourceTask{id=XX} finished initialization and start 
(org.apache.kafka.connect.runtime.WorkerSourceTask:138)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-09 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-4517:
---
Summary: Remove kafka-consumer-offset-checker.sh script since already 
deprecated in Kafka 9  (was: Remove kafka-consumer-offset-checker.sh script 
since deprecated in Kafka 9)

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Minor
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since deprecated in Kafka 9

2016-12-09 Thread Jeff Widman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Widman updated KAFKA-4517:
---
Summary: Remove kafka-consumer-offset-checker.sh script since deprecated in 
Kafka 9  (was: Remove shell scripts deprecated in Kafka 9)

> Remove kafka-consumer-offset-checker.sh script since deprecated in Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Minor
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4375) Kafka consumer may swallow some interrupts meant for the calling thread

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736108#comment-15736108
 ] 

ASF GitHub Bot commented on KAFKA-4375:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2100


> Kafka consumer may swallow some interrupts meant for the calling thread
> ---
>
> Key: KAFKA-4375
> URL: https://issues.apache.org/jira/browse/KAFKA-4375
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
> Fix For: 0.10.2.0
>
>
> Apache Storm has added a new data source ("spout") based on the Kafka 0.9 
> consumer. Storm interacts with the consumer by having one thread per spout 
> instance loop calls to poll/commitSync etc. When Storm shuts down, another 
> thread indicates that the looping threads should shut down by interrupting 
> them, and joining them.
> If one of the looping threads happen to be interrupted while executing 
> certain sleeps in some consumer methods (commitSync and committed at least), 
> the interrupt can be lost because they contain a call to SystemTime.sleep, 
> which swallows the interrupt.
> Is this behavior by design, or can SystemTime be changed to reset the thread 
> interrupt flag when catching an InterruptedException? 
> I haven't checked the rest of the client code, so it's possible that this is 
> an issue in other parts of the code too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2100: KAFKA-4375: Reset thread interrupted state in a fe...

2016-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2100


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4375) Kafka consumer may swallow some interrupts meant for the calling thread

2016-12-09 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-4375:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 2100
[https://github.com/apache/kafka/pull/2100]

> Kafka consumer may swallow some interrupts meant for the calling thread
> ---
>
> Key: KAFKA-4375
> URL: https://issues.apache.org/jira/browse/KAFKA-4375
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: Stig Rohde Døssing
>Assignee: Stig Rohde Døssing
> Fix For: 0.10.2.0
>
>
> Apache Storm has added a new data source ("spout") based on the Kafka 0.9 
> consumer. Storm interacts with the consumer by having one thread per spout 
> instance loop calls to poll/commitSync etc. When Storm shuts down, another 
> thread indicates that the looping threads should shut down by interrupting 
> them, and joining them.
> If one of the looping threads happen to be interrupted while executing 
> certain sleeps in some consumer methods (commitSync and committed at least), 
> the interrupt can be lost because they contain a call to SystemTime.sleep, 
> which swallows the interrupt.
> Is this behavior by design, or can SystemTime be changed to reset the thread 
> interrupt flag when catching an InterruptedException? 
> I haven't checked the rest of the client code, so it's possible that this is 
> an issue in other parts of the code too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4517) Remove shell scripts deprecated in Kafka 9

2016-12-09 Thread Jeff Widman (JIRA)
Jeff Widman created KAFKA-4517:
--

 Summary: Remove shell scripts deprecated in Kafka 9
 Key: KAFKA-4517
 URL: https://issues.apache.org/jira/browse/KAFKA-4517
 Project: Kafka
  Issue Type: Task
Affects Versions: 0.10.0.1, 0.10.0.0, 0.10.1.0
Reporter: Jeff Widman
Priority: Minor


Kafka 9 deprecated kafka-consumer-offset-checker.sh 
(kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
(kafka.admin.ConsumerGroupCommand). 

Since this was deprecated in 9, and the full functionality of the old script 
appears to be available in the new script, can we remove the old shell script 
in 10? 

>From an Ops perspective, it's confusing when I'm trying to check consumer 
>offsets that I open the bin directory, and see a script that seems to do 
>exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Jay Kreps
Hey Michael,

Yeah, I don't think you need to go into the details of whatever you guys
have. I think several people in the thread said "let's do XA transactions
too!" Obviously in a world where features were free and always worked
perfectly we would! I've probably talked to about 100 people about their
use of XA transactions in different systems and my observation has been (a)
they are a bit of an operational nightmare, (b) the use cases i've
understood don't actually require full XA transactions they actually
require a much weaker and easier to guarantee property. The result is you
pay a big complexity cost for a guarantee much stronger than what you
wanted. My sense is that this opinion is broadly shared by the distributed
systems community at large and by Kafka folks in particular.

I'm a contrarian so I think it is great not to be too swayed by "common
wisdom" though. Five years ago there was a consensus that distributed
transactions were too hard to implement in an operationally sound way,
which i think was not correct, so the bad reputation for cross-system
transactions may be equally wrong!

To build a compelling case this is wrong I think two things need to be done:

   1. Build a case that there are a large/important set of use cases that
   cannot be solved with two independent transactions (as i described), and
   that these use cases are things Kafka should be able to do.
   2. Come up with the concrete extensions to the KIP-98 proposal that
   would enable an operationally sound implementation for pluggable
   multi-system XA.

-Jay



On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
wrote:

> Hi Jay,
>
> I can't go too deep into exact implantation due to no NDA. So apologies
> here.
>
> Essentially we have multiple processes each owning selection of accounts
> so on general flows an action for an account just needs to be managed local
> to the owning node, happy days ever change is handled as a tick tock change.
>
> Unfortunately when a transfer occurs we need the two processes to
> co-ordinate their transaction, we also need to ensure both don't continue
> other actions/changesl, we do this using a data grid technology. This grid
> technology supports transaction manager that we couple into currently our
> jms provider which supports full XA transactions as such we can manage the
> production of the change messages out the system transactionally as well as
> the in grid state.
>
> The obvious arguement here is should we even look to move this flow off
> JMS then. We prob shouldn't nor will do this.
>
> The point is that I think saying Kafka supports transactions but then not
> supporting it as per the traditional sense leads to developers expecting
> similar behaviour and will cause issues in prod when they find it doesn't
> work as they're used to.
>
> As my other response earlier, is there a better name to describe this
> feature, if we're not implementing transactions to the traditional
> transaction expected, to avoid this confusion?
>
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Friday, December 9, 2016 6:08:07 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Michael,
>
> Doesn't that example have more to do with applying the update against two
> rows in a single transaction? That is, clearly the write to Kafka needs to
> be "transactional" and the write to the destination needs to be
> transactional, but it's not clear to me that you need isolation that spans
> both operations. Can you dive into the system architecture a bit more and
> explain why Kafka needs to participate in the same transaction as the
> destination system?
>
> -Jay
>
> On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
> wrote:
>
> > Usecase in IG:
> >
> > Fund transfer between accounts. When we debit one account and fund
> another
> > we must ensure the records to both occur as an acid action, and as a
> single
> > transaction.
> >
> > Today we achieve this because we have jms, as such we can do the actions
> > needed in an xa transaction across both the accounts. To move this flow
> to
> > Kafka we would need support of XA transaction.
> >
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Michael Pearce 
> > Sent: Friday, December 9, 2016 6:09:06 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> > Messaging
> >
> > Hi Jay,
> >
> > For me having an XA transaction allows for ensuring ACID across my
> > application.
> >
> > I believe it is part of the JMS api, and obviously JMS still is in
> > enterprise very widely adopted for Messaging transport , so obviously to
> > say it isn't widely used i think is ignoring a whole range of users. Like
> > wise I believe frameworks like spring etc fully support it more evidence
> of
> > its wide adoption.
> >
> > On this note personally we try to avoid 

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Guozhang Wang
@Onur

I think you are asking about this bullet point right?

https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#bookmark=id.ia8lrje8xifh

To me "durability" means that once a transaction has been acknowledged
committed, it will never be lost. In this sense the current design does
gurantee this property since the coordinator will not return the
EndTxnRequest until it has completed writing the marker to each of the
partitions. The scenario you were concerning is more about "availability",
which KIP-98 does not ever try to improve on: it is purely dependent on the
replication mechanism that Kafka already has.


@Andrew

The exactly-once delivery semantics in Kafka is not designed to replace
existing transactional systems, like a DBMS. And I agree with you that it
is "different" with what ACID properties provides. Myself has also worked
on some in-memory DBMS systems which tries to provided "queuing operations"
in transactions such like:

beginTxn;

updateRow1();
updateRow2();
sendtoQueue();

endTxn;


And from my experience, I feel that if users want to leverage Kafka as the
queuing services in the above example, they need to make those Kafka topics
"purely owned" by the applications integrating with the transactional
systems instead of making them shared among different producers or
different applications.

And about the various error cases you mentioned above:

* A message is published to a topic which crashes the leader Kafka node, as
  it's replicated across the cluster, it crashes all of the other Kafka
nodes
  (we've really had this - SEGV, our fault and we've fixed it, but it
happened)
  so this is a kind of rolling node crash in a cluster
* Out of memory error in one or more Kafka nodes
* Disk fills in one or more Kafka nodes
* Uncontrolled power-off to all nodes in the cluster

I think they will all be handled as an aborted transaction so atomicity is
still guaranteed.


@Michael

About the naming. As I mentioned before, it is indeed different to what
DBMS has for transactions (for example, we do not provide any
"serializbility" isolation level, or any application-specific consistency
guarantees), but I feel it still makes to use the term "transaction" to
emphasize its properties in terms of atomicity and durability, and it
actually makes audience who are familiar with WAL etc easier to understand
this proposal.



Guozhang



On Fri, Dec 9, 2016 at 10:48 AM, Onur Karaman 
wrote:

> I had a similar comment to Andrew's in terms of the system's safety. He may
> have stated this scenario in a slightly different way in his first failure
> scenario. Specifically I'm wondering if the transaction can appear complete
> and then later go into a bad state. A transaction can involve messages
> spanning multiple partitions and therefore topics, which can have different
> configurations. As part of EndTxnRequest, we send a COMMIT or ABORT marker
> to each partition involved.
>
> Does this mean that the transaction is only as durable as the weakest
> partition?
>
> For instance, a transaction may involve a partition whose RF=3 and
> min.insync.replicas=2 and another partition whose RF=1 with no
> min.insync.replicas specified.
>
> On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
> wrote:
>
> > Hi Jay,
> >
> > I can't go too deep into exact implantation due to no NDA. So apologies
> > here.
> >
> > Essentially we have multiple processes each owning selection of accounts
> > so on general flows an action for an account just needs to be managed
> local
> > to the owning node, happy days ever change is handled as a tick tock
> change.
> >
> > Unfortunately when a transfer occurs we need the two processes to
> > co-ordinate their transaction, we also need to ensure both don't continue
> > other actions/changesl, we do this using a data grid technology. This
> grid
> > technology supports transaction manager that we couple into currently our
> > jms provider which supports full XA transactions as such we can manage
> the
> > production of the change messages out the system transactionally as well
> as
> > the in grid state.
> >
> > The obvious arguement here is should we even look to move this flow off
> > JMS then. We prob shouldn't nor will do this.
> >
> > The point is that I think saying Kafka supports transactions but then not
> > supporting it as per the traditional sense leads to developers expecting
> > similar behaviour and will cause issues in prod when they find it doesn't
> > work as they're used to.
> >
> > As my other response earlier, is there a better name to describe this
> > feature, if we're not implementing transactions to the traditional
> > transaction expected, to avoid this confusion?
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Jay Kreps 
> > Sent: Friday, December 9, 2016 6:08:07 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> > Messaging
> >
> > Hey Mi

Re: Reg: ACLS

2016-12-09 Thread Ashish Singh
Ismael, thanks for the correction. I assumed the question was targeted for
without any security enabled, but yea even then IP based auth is possible.

On Fri, Dec 9, 2016 at 11:01 AM, Ismael Juma  wrote:

> It is possible to use ACLs with IPs or other SASL mechanisms (PLAIN for
> example). So Kerberos and SSL are not required (although commonly used).
>
> Ismael
>
> On Fri, Dec 9, 2016 at 6:59 PM, Ashish Singh  wrote:
>
> > Hey,
> >
> > No it does not. Without kerberos or ssl, all requests will appear to come
> > from anonymous user, and as long as a user is not identified it is not
> > possible to do authorization on.
> >
> > On Fri, Dec 9, 2016 at 10:40 AM, BigData dev 
> > wrote:
> >
> > > Hi All,
> > > I have a question here, Does Kafka support ACL's with out kerberos/SSL?
> > >
> > > Any info on this would be greatly helpful.
> > >
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashish
> >
>



-- 

Regards,
Ashish


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Jay Kreps
With respect to the naming, I think it makes sense for two reasons.

   1. The begin/commit/rollback semantics are what people expect from
   transactions, i.e. it passes the intuitive "gut check" meaning people have.
   If you make up a new name it will likely not convey the expectation.
   2. There is a strong analogy between logs and tables and this feature
   enables ACID-like usage of the log in modifying a table.

I'll dive into the later. What does ACID mean for a log? Well it is
semi-well-defined for a table, what would the implications be for the
equivalent log representation?

Here is the analogy:

   - *Atomic* - This is straight-forward, a set of updates are either all
   published or all not published.
   - *Consistent* - This one is not very well defined in ACID, it isn't
   always as simple as linearizability. It alternately means either (a) a
   transaction started now sees all past committed transactions, (b) the
   database checks various DB-specific things like foreign-key constraints, or
   (c) some undefined notion of application correctness without any particular
   invariant. I think the most sane interpretation is based on (a) and means
   that the consumer sees transactions in commit order. We don't try to
   guarantee this, but for a log the reader controls the order of processing
   so this is possible. We could add future features to do this reordering for
   people as a convenience.
   - *Isolated* - In ACID this means a reader doesn't see the results of
   uncommitted transactions. In a log this means you get complete transactions
   all at once rather than getting half a transaction. This is primarily up to
   you in how you use the data you consume.
   - *Durable* - This falls out of Kafka's replication.

I'm less worried about confusion with other messaging systems. Kafka is
genuinely different in a number of areas and it is worth people
understanding that difference.

-Jay

On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
wrote:

> Hi Jay,
>
> I can't go too deep into exact implantation due to no NDA. So apologies
> here.
>
> Essentially we have multiple processes each owning selection of accounts
> so on general flows an action for an account just needs to be managed local
> to the owning node, happy days ever change is handled as a tick tock change.
>
> Unfortunately when a transfer occurs we need the two processes to
> co-ordinate their transaction, we also need to ensure both don't continue
> other actions/changesl, we do this using a data grid technology. This grid
> technology supports transaction manager that we couple into currently our
> jms provider which supports full XA transactions as such we can manage the
> production of the change messages out the system transactionally as well as
> the in grid state.
>
> The obvious arguement here is should we even look to move this flow off
> JMS then. We prob shouldn't nor will do this.
>
> The point is that I think saying Kafka supports transactions but then not
> supporting it as per the traditional sense leads to developers expecting
> similar behaviour and will cause issues in prod when they find it doesn't
> work as they're used to.
>
> As my other response earlier, is there a better name to describe this
> feature, if we're not implementing transactions to the traditional
> transaction expected, to avoid this confusion?
>
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Friday, December 9, 2016 6:08:07 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Michael,
>
> Doesn't that example have more to do with applying the update against two
> rows in a single transaction? That is, clearly the write to Kafka needs to
> be "transactional" and the write to the destination needs to be
> transactional, but it's not clear to me that you need isolation that spans
> both operations. Can you dive into the system architecture a bit more and
> explain why Kafka needs to participate in the same transaction as the
> destination system?
>
> -Jay
>
> On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
> wrote:
>
> > Usecase in IG:
> >
> > Fund transfer between accounts. When we debit one account and fund
> another
> > we must ensure the records to both occur as an acid action, and as a
> single
> > transaction.
> >
> > Today we achieve this because we have jms, as such we can do the actions
> > needed in an xa transaction across both the accounts. To move this flow
> to
> > Kafka we would need support of XA transaction.
> >
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Michael Pearce 
> > Sent: Friday, December 9, 2016 6:09:06 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> > Messaging
> >
> > Hi Jay,
> >
> > For me having an XA transaction allows for ensuring ACID across my
> > application.
> >
> > I believe

Re: Reg: ACLS

2016-12-09 Thread Ismael Juma
It is possible to use ACLs with IPs or other SASL mechanisms (PLAIN for
example). So Kerberos and SSL are not required (although commonly used).

Ismael

On Fri, Dec 9, 2016 at 6:59 PM, Ashish Singh  wrote:

> Hey,
>
> No it does not. Without kerberos or ssl, all requests will appear to come
> from anonymous user, and as long as a user is not identified it is not
> possible to do authorization on.
>
> On Fri, Dec 9, 2016 at 10:40 AM, BigData dev 
> wrote:
>
> > Hi All,
> > I have a question here, Does Kafka support ACL's with out kerberos/SSL?
> >
> > Any info on this would be greatly helpful.
> >
> >
> > Thanks
> >
>
>
>
> --
>
> Regards,
> Ashish
>


Re: Reg: ACLS

2016-12-09 Thread Ashish Singh
Hey,

No it does not. Without kerberos or ssl, all requests will appear to come
from anonymous user, and as long as a user is not identified it is not
possible to do authorization on.

On Fri, Dec 9, 2016 at 10:40 AM, BigData dev 
wrote:

> Hi All,
> I have a question here, Does Kafka support ACL's with out kerberos/SSL?
>
> Any info on this would be greatly helpful.
>
>
> Thanks
>



-- 

Regards,
Ashish


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Onur Karaman
I had a similar comment to Andrew's in terms of the system's safety. He may
have stated this scenario in a slightly different way in his first failure
scenario. Specifically I'm wondering if the transaction can appear complete
and then later go into a bad state. A transaction can involve messages
spanning multiple partitions and therefore topics, which can have different
configurations. As part of EndTxnRequest, we send a COMMIT or ABORT marker
to each partition involved.

Does this mean that the transaction is only as durable as the weakest
partition?

For instance, a transaction may involve a partition whose RF=3 and
min.insync.replicas=2 and another partition whose RF=1 with no
min.insync.replicas specified.

On Fri, Dec 9, 2016 at 10:25 AM, Michael Pearce 
wrote:

> Hi Jay,
>
> I can't go too deep into exact implantation due to no NDA. So apologies
> here.
>
> Essentially we have multiple processes each owning selection of accounts
> so on general flows an action for an account just needs to be managed local
> to the owning node, happy days ever change is handled as a tick tock change.
>
> Unfortunately when a transfer occurs we need the two processes to
> co-ordinate their transaction, we also need to ensure both don't continue
> other actions/changesl, we do this using a data grid technology. This grid
> technology supports transaction manager that we couple into currently our
> jms provider which supports full XA transactions as such we can manage the
> production of the change messages out the system transactionally as well as
> the in grid state.
>
> The obvious arguement here is should we even look to move this flow off
> JMS then. We prob shouldn't nor will do this.
>
> The point is that I think saying Kafka supports transactions but then not
> supporting it as per the traditional sense leads to developers expecting
> similar behaviour and will cause issues in prod when they find it doesn't
> work as they're used to.
>
> As my other response earlier, is there a better name to describe this
> feature, if we're not implementing transactions to the traditional
> transaction expected, to avoid this confusion?
>
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Friday, December 9, 2016 6:08:07 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Michael,
>
> Doesn't that example have more to do with applying the update against two
> rows in a single transaction? That is, clearly the write to Kafka needs to
> be "transactional" and the write to the destination needs to be
> transactional, but it's not clear to me that you need isolation that spans
> both operations. Can you dive into the system architecture a bit more and
> explain why Kafka needs to participate in the same transaction as the
> destination system?
>
> -Jay
>
> On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
> wrote:
>
> > Usecase in IG:
> >
> > Fund transfer between accounts. When we debit one account and fund
> another
> > we must ensure the records to both occur as an acid action, and as a
> single
> > transaction.
> >
> > Today we achieve this because we have jms, as such we can do the actions
> > needed in an xa transaction across both the accounts. To move this flow
> to
> > Kafka we would need support of XA transaction.
> >
> >
> >
> > Sent using OWA for iPhone
> > 
> > From: Michael Pearce 
> > Sent: Friday, December 9, 2016 6:09:06 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> > Messaging
> >
> > Hi Jay,
> >
> > For me having an XA transaction allows for ensuring ACID across my
> > application.
> >
> > I believe it is part of the JMS api, and obviously JMS still is in
> > enterprise very widely adopted for Messaging transport , so obviously to
> > say it isn't widely used i think is ignoring a whole range of users. Like
> > wise I believe frameworks like spring etc fully support it more evidence
> of
> > its wide adoption.
> >
> > On this note personally we try to avoid transactions entirely in our
> flows
> > for performance and simplicity. but we do alas unfortunately have one or
> > two places we cannot ignore it.
> >
> > Cheers
> > Mike
> >
> > Sent using OWA for iPhone
> > 
> > From: Jay Kreps 
> > Sent: Thursday, December 8, 2016 11:25:53 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> > Messaging
> >
> > Hey Edoardo,
> >
> > For (3) can you outline what you think the benefit and use cases for a
> more
> > general cross-system XA feature would be an what changes to the proposal
> > would be required to enable it? When I have asked people who wanted
> > cross-system XA in the past what they wanted it for, I haven't really
> > gotten any answers that made sense. Every person really wanted something
> > t

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

2016-12-09 Thread Guozhang Wang
I will read through the KIP doc once again to provide more detailed
feedbacks, but let me through my two cents just for the above email.

There are a few motivations to have a "consistent" stop-point across tasks
belonging to different sub-topologies. One of them is for interactive
queries: say you have two state stores belonging to two sub-topologies, if
they stopped at different points, then when user querying them they will
see inconsistent answers (think about the example people always use in
databases: the stores represent A and B's bank account and a record is
processed to move X dollar from A to B).

As for the implementation to support such consistent stop-points though, I
think the metadata field in offset topic does worth exploring, because
Streams may very likely use the transactional APIs proposed in KIP-98 to
let producers send offsets in a transactional manner, not the consumers
themselves (details can be found here
https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8
).



Guozhang


On Tue, Dec 6, 2016 at 2:45 PM, Matthias J. Sax 
wrote:

> Thanks for the input Jay.
>
> From my understanding, your question boils down to how fuzzy the stop
> point can/should be, and what guarantees we want to provide to the user.
> This has multiple dimension:
>
>
> 1. Using a timestamp has the main issue, that if an input topic has no
> data with this timestamp, the application does never finish (ie, stop
> timestamp is in "the future").
>
> Furthermore, it would require the user to specify the timestamp because
> if we use current (ie, on startup) system time as stop-timestamp, the
> case of a "future stop-timestamp" might be very common (think no active
> producers for a topic -- it's batch processing). Thus, the user needs to
> know the stop-timestamp, which might be hard for her to figure out -- in
> the current design it's way simpler for the user to activate "auto stop".
>
> Last but not least, assume an application with two subtopologies that
> are connected via an intermediate topic and both subtopologies are
> executed in different JVMs. The first topology could filter a lot of
> messages and thus it might happen, that it never writes a record with
> timestamp >= stop-timestamp into the intermediate topic. Thus, even if
> the first JVM terminates the second would not stop automatically as it
> never reads a record with timestamp >= stop-timestamp.
>
> There would be some workaround if we shut down in a "fuzzy way", ie,
> with no guarantees what record will actually get processed (ie, stop
> processing somewhat early of some cases). But I will argue in (3) why
> this "stop roughly about this time semantic" is not a good idea.
>
>
> 2. I was not aware of a metadata field for committed offsets and this
> sounds quite appealing. However, thinking about it in more detail, I
> have my doubts we can use it:
>
> If we want to propagate stop-offsets for intermediate topics, all
> producer instances would need to update this metadata field, thus need
> to commit (A producer that does commit? Well we could use "restore
> consumer" with manual partition assignment for this.) -- however, this
> would not only conflict with the commits of the actual consumer, but
> also in between all running producers.
>
>
> 3. This is the overall "how fuzzy we want to be" discussion. I would
> argue that we should provide somewhat strong stop consistency. Assume
> the following use case. An external application generates data in
> batches and writes files to HDFS. Those files are imported into Kafka
> via Connect. Each time a batch of data gots inserted into Kafka, this
> data should be processed with a Streams application. If we cannot
> guarantee that all data of this batch is fully processed and the result
> is complete, use experience would be quite bad.
>
> Furthermore, we want to guard a running batch job to process "too much"
> data: Assume the same scenario as before with HDFS + Connect. For
> whatever reason, a Streams batch job takes longer than usual, and while
> it is running new data is appended to the topic as new files (of the
> next batch) are already available. We don't want to process this data
> with the current running app, but want it to be included in the next
> batch run (you could imagine, that each batch job will write the result
> into a different output topic). Thus, we want to have all data processed
> completely, ie, provide strong start-stop consistency.
>
>
> Sorry for the quite long answer. Hope it is convincing though :)
>
>
> -Matthias
>
>
> On 12/5/16 4:44 PM, Jay Kreps wrote:
> > I'd like to second the discouragement of adding a new topic per job. We
> > went down this path in Samza and I think the result was quite a mess. You
> > had to read the full topic every time a job started and so it added a lot
> > of overhead and polluted the topic space.
> >
> > What if we did the following:
> >
> >1. Use timestamp instead of offset
> >2. Store the "stopping time

Reg: ACLS

2016-12-09 Thread BigData dev
Hi All,
I have a question here, Does Kafka support ACL's with out kerberos/SSL?

Any info on this would be greatly helpful.


Thanks


[VOTE] KIP-100 - Relax Type constraints in Kafka Streams API

2016-12-09 Thread Xavier Léauté
Hi everyone,

I would like to start the vote for KIP-100 unless there are any more
comments.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-100+-+Relax+Type+constraints+in+Kafka+Streams+API

corresponding PR here https://github.com/apache/kafka/pull/2205

Thanks,
Xavier


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Michael Pearce
Hi Jay,

I can't go too deep into exact implantation due to no NDA. So apologies here.

Essentially we have multiple processes each owning selection of accounts so on 
general flows an action for an account just needs to be managed local to the 
owning node, happy days ever change is handled as a tick tock change.

Unfortunately when a transfer occurs we need the two processes to co-ordinate 
their transaction, we also need to ensure both don't continue other 
actions/changesl, we do this using a data grid technology. This grid technology 
supports transaction manager that we couple into currently our jms provider 
which supports full XA transactions as such we can manage the production of the 
change messages out the system transactionally as well as the in grid state.

The obvious arguement here is should we even look to move this flow off JMS 
then. We prob shouldn't nor will do this.

The point is that I think saying Kafka supports transactions but then not 
supporting it as per the traditional sense leads to developers expecting 
similar behaviour and will cause issues in prod when they find it doesn't work 
as they're used to.

As my other response earlier, is there a better name to describe this feature, 
if we're not implementing transactions to the traditional transaction expected, 
to avoid this confusion?


Sent using OWA for iPhone

From: Jay Kreps 
Sent: Friday, December 9, 2016 6:08:07 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

Hey Michael,

Doesn't that example have more to do with applying the update against two
rows in a single transaction? That is, clearly the write to Kafka needs to
be "transactional" and the write to the destination needs to be
transactional, but it's not clear to me that you need isolation that spans
both operations. Can you dive into the system architecture a bit more and
explain why Kafka needs to participate in the same transaction as the
destination system?

-Jay

On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
wrote:

> Usecase in IG:
>
> Fund transfer between accounts. When we debit one account and fund another
> we must ensure the records to both occur as an acid action, and as a single
> transaction.
>
> Today we achieve this because we have jms, as such we can do the actions
> needed in an xa transaction across both the accounts. To move this flow to
> Kafka we would need support of XA transaction.
>
>
>
> Sent using OWA for iPhone
> 
> From: Michael Pearce 
> Sent: Friday, December 9, 2016 6:09:06 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hi Jay,
>
> For me having an XA transaction allows for ensuring ACID across my
> application.
>
> I believe it is part of the JMS api, and obviously JMS still is in
> enterprise very widely adopted for Messaging transport , so obviously to
> say it isn't widely used i think is ignoring a whole range of users. Like
> wise I believe frameworks like spring etc fully support it more evidence of
> its wide adoption.
>
> On this note personally we try to avoid transactions entirely in our flows
> for performance and simplicity. but we do alas unfortunately have one or
> two places we cannot ignore it.
>
> Cheers
> Mike
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Thursday, December 8, 2016 11:25:53 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Edoardo,
>
> For (3) can you outline what you think the benefit and use cases for a more
> general cross-system XA feature would be an what changes to the proposal
> would be required to enable it? When I have asked people who wanted
> cross-system XA in the past what they wanted it for, I haven't really
> gotten any answers that made sense. Every person really wanted something
> that would be better solved by a transactional (or idempotent) write to
> Kafka followed by an independent transactional (or idempotent) consumption
> (which this proposal enables). For the use cases they described tying these
> two things together had no advantage and many disadvantages.
>
> I have one use case which would be accomplished by cross-system XA which is
> allowing the producer to block on the synchronous processing of the message
> by (all? some?) consumers. However I'm not convinced that cross-system XA
> is the best solution to this problem, and I'm also not convinced this is an
> important problem to solve. But maybe you have something in mind here.
>
> -Jay
>
>
>
> On Thu, Dec 8, 2016 at 1:15 PM, Edoardo Comar  wrote:
>
> > Hi,
> > thanks, very interesting KIP ... I haven't fully digested it yet.
> >
> > We have many users who choose not to use the Java client,  so I have
> > concerns about the added complexity in developing the clients.
> > A few 

[GitHub] kafka-site issue #36: Streams standalone docs

2016-12-09 Thread guozhangwang
Github user guozhangwang commented on the issue:

https://github.com/apache/kafka-site/pull/36
  
Thanks @derrickdoo ! Please add me to reviewers list when you created the 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site issue #36: Streams standalone docs

2016-12-09 Thread derrickdoo
Github user derrickdoo commented on the issue:

https://github.com/apache/kafka-site/pull/36
  
@guozhangwang yes, 2 things need to happen to new copies of stuff in side 
the versioned docs folder. 

1. streams.html will need to have the full shell around it, so the includes 
at the top and bottom as seen in 0101/streams.html need to be there all the 
time now.

2. paths to images in all the new doc files (0101 and beyond) need to 
include the full path to the image directory.

I'll make a PR to the kafka repo to make sure everything is all set for the 
next release


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Jay Kreps
Hey Michael,

Doesn't that example have more to do with applying the update against two
rows in a single transaction? That is, clearly the write to Kafka needs to
be "transactional" and the write to the destination needs to be
transactional, but it's not clear to me that you need isolation that spans
both operations. Can you dive into the system architecture a bit more and
explain why Kafka needs to participate in the same transaction as the
destination system?

-Jay

On Thu, Dec 8, 2016 at 10:19 PM, Michael Pearce 
wrote:

> Usecase in IG:
>
> Fund transfer between accounts. When we debit one account and fund another
> we must ensure the records to both occur as an acid action, and as a single
> transaction.
>
> Today we achieve this because we have jms, as such we can do the actions
> needed in an xa transaction across both the accounts. To move this flow to
> Kafka we would need support of XA transaction.
>
>
>
> Sent using OWA for iPhone
> 
> From: Michael Pearce 
> Sent: Friday, December 9, 2016 6:09:06 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hi Jay,
>
> For me having an XA transaction allows for ensuring ACID across my
> application.
>
> I believe it is part of the JMS api, and obviously JMS still is in
> enterprise very widely adopted for Messaging transport , so obviously to
> say it isn't widely used i think is ignoring a whole range of users. Like
> wise I believe frameworks like spring etc fully support it more evidence of
> its wide adoption.
>
> On this note personally we try to avoid transactions entirely in our flows
> for performance and simplicity. but we do alas unfortunately have one or
> two places we cannot ignore it.
>
> Cheers
> Mike
>
> Sent using OWA for iPhone
> 
> From: Jay Kreps 
> Sent: Thursday, December 8, 2016 11:25:53 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> Messaging
>
> Hey Edoardo,
>
> For (3) can you outline what you think the benefit and use cases for a more
> general cross-system XA feature would be an what changes to the proposal
> would be required to enable it? When I have asked people who wanted
> cross-system XA in the past what they wanted it for, I haven't really
> gotten any answers that made sense. Every person really wanted something
> that would be better solved by a transactional (or idempotent) write to
> Kafka followed by an independent transactional (or idempotent) consumption
> (which this proposal enables). For the use cases they described tying these
> two things together had no advantage and many disadvantages.
>
> I have one use case which would be accomplished by cross-system XA which is
> allowing the producer to block on the synchronous processing of the message
> by (all? some?) consumers. However I'm not convinced that cross-system XA
> is the best solution to this problem, and I'm also not convinced this is an
> important problem to solve. But maybe you have something in mind here.
>
> -Jay
>
>
>
> On Thu, Dec 8, 2016 at 1:15 PM, Edoardo Comar  wrote:
>
> > Hi,
> > thanks, very interesting KIP ... I haven't fully digested it yet.
> >
> > We have many users who choose not to use the Java client,  so I have
> > concerns about the added complexity in developing the clients.
> > A few questions.
> >
> > 1 - is mixing transactional and non transactional messages on the *same
> > topic-partition* really a requirement ?
> > What use case does it satisfy?
> >
> > 2 - I guess some clients may only be interested to implement the producer
> > idempotency.
> > It's not clear how they could be implemented without having to add the
> > transaction capabilities.
> > As others on this list have said, I too would like to see idempotency as
> a
> > more basic feature, on top which txns can be built.
> >
> > 3 - The KIP seems focused on a use case where consumption from a topic
> and
> > subsequent production are part of the producer transaction.
> >
> > It'd be great to see a way to extend the producer transaction to include
> > additional transactional resources,
> > so that the consumption from another topic just becomes a special case of
> > a more general "distributed" txn.
> >
> > Edo
> > --
> > Edoardo Comar
> > IBM MessageHub
> > eco...@uk.ibm.com
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> > IBM United Kingdom Limited Registered in England and Wales with number
> > 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants.
> PO6
> > 3AU
> >
> >
> >
> > From:   Guozhang Wang 
> > To: "dev@kafka.apache.org" 
> > Date:   30/11/2016 22:20
> > Subject:[DISCUSS] KIP-98: Exactly Once Delivery and Transactional
> > Messaging
> >
> >
> >
> > Hi all,
> >
> > I have just created KIP-98 to enhance Kafka with exactly once delivery
> > semantics:
> >
> > *https://cwiki.apache

[GitHub] kafka pull request #2178: Minor: Fix typos in KafkaConsumer docs

2016-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2178


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4516) When a CachingStateStore is closed it should clear its associated NamedCache. Subsequent queries should throw InvalidStateStoreException

2016-12-09 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4516:
--
Status: Patch Available  (was: Open)

> When a CachingStateStore is closed it should clear its associated NamedCache. 
> Subsequent queries should throw InvalidStateStoreException
> 
>
> Key: KAFKA-4516
> URL: https://issues.apache.org/jira/browse/KAFKA-4516
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> When close is called on a CachingStateStore we don't release the memory it is 
> using in the Cache. This could result in the cache being full of data that is 
> redundant. We also still allow queries on the CachedStateStore even though it 
> has been closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-09 Thread Tom DeVoe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735863#comment-15735863
 ] 

Tom DeVoe edited comment on KAFKA-4477 at 12/9/16 5:39 PM:
---

File limit is set to {{Max open files1048576  1048576   
   files}} and the server only ever got to 2K open file descriptors. 
Somewhat related however - the instances that encountered this saw its open 
files start steadily increasing and it seemed that it would keep increasing had 
I not restarted the process. 


was (Author: tdevoe):
File limit is set to {{Max open files1048576  1048576   
   files}} and the server only ever got to 2K open file descriptors. 
Though the instances that encountered this saw its open files start steadily 
increasing. It seemed that it would keep increasing had I not restarted the 
process. 

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: issue_node_1001.log, issue_node_1002.log, 
> issue_node_1003.log, kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-09 Thread Tom DeVoe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735863#comment-15735863
 ] 

Tom DeVoe commented on KAFKA-4477:
--

File limit is set to {{Max open files1048576  1048576   
   files}} and the server only ever got to 2K open file descriptors. 
Though the instances that encountered this saw its open files start steadily 
increasing. It seemed that it would keep increasing had I not restarted the 
process. 

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: issue_node_1001.log, issue_node_1002.log, 
> issue_node_1003.log, kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4516) When a CachingStateStore is closed it should clear its associated NamedCache. Subsequent queries should throw InvalidStateStoreException

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735855#comment-15735855
 ] 

ASF GitHub Bot commented on KAFKA-4516:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2235

KAFKA-4516: When a CachingStateStore is closed it should clear its 
associated NamedCache

Clear and remove the NamedCache from the ThreadCache when a 
CachingKeyValueStore or CachingWindowStore is closed.
Validate that the store is open when doing any queries or writes to Caching 
State Stores.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4516

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2235


commit cfb2bffe8492b2da1f7ec06796d66ae59b720175
Author: Damian Guy 
Date:   2016-12-09T17:31:19Z

clear caches when store is closed




> When a CachingStateStore is closed it should clear its associated NamedCache. 
> Subsequent queries should throw InvalidStateStoreException
> 
>
> Key: KAFKA-4516
> URL: https://issues.apache.org/jira/browse/KAFKA-4516
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> When close is called on a CachingStateStore we don't release the memory it is 
> using in the Cache. This could result in the cache being full of data that is 
> redundant. We also still allow queries on the CachedStateStore even though it 
> has been closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2235: KAFKA-4516: When a CachingStateStore is closed it ...

2016-12-09 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2235

KAFKA-4516: When a CachingStateStore is closed it should clear its 
associated NamedCache

Clear and remove the NamedCache from the ThreadCache when a 
CachingKeyValueStore or CachingWindowStore is closed.
Validate that the store is open when doing any queries or writes to Caching 
State Stores.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4516

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2235


commit cfb2bffe8492b2da1f7ec06796d66ae59b720175
Author: Damian Guy 
Date:   2016-12-09T17:31:19Z

clear caches when store is closed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-09 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735852#comment-15735852
 ] 

Jason Gustafson commented on KAFKA-4477:


One other thing to check is whether we're hitting the open file limit. There 
would be more open files in 0.10.1 with the addition of the time index.

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: issue_node_1001.log, issue_node_1002.log, 
> issue_node_1003.log, kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-09 Thread Tom DeVoe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom DeVoe updated KAFKA-4477:
-
Attachment: issue_node_1001.log
issue_node_1003.log
issue_node_1002.log

I have attached logs for the 3 servers the first time this happened. Note the 
logs are (mostly) anonymized in that all of the 
hostnames/topics/consumers/producers have been replaced with some generic ids. 
node_1002 is the server that reduced its ISRs.

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: issue_node_1001.log, issue_node_1002.log, 
> issue_node_1003.log, kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4492) java.lang.IllegalStateException: Attempting to put a clean entry for key... into NamedCache

2016-12-09 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy resolved KAFKA-4492.
---
Resolution: Fixed

> java.lang.IllegalStateException: Attempting to put a clean entry for key... 
> into NamedCache
> ---
>
> Key: KAFKA-4492
> URL: https://issues.apache.org/jira/browse/KAFKA-4492
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> This follows on from https://issues.apache.org/jira/browse/KAFKA-4311
> The exception seems to be triggered by topologies with multiple joined 
> tables. As a new record arrives in one table it triggers an eviction. The 
> eviction causes a flush which will trigger a join processor. These in-turn 
> does a cache lookup and if the value is not in the cache, then it will be 
> retrieved from the store and put in the cache, triggering another eviction. 
> And so on.
> Exception reported on mailing list
> https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> Further investigation into this also reveals that this same eviction process 
> can send the cache eviction into an infinite loop. It seems that the LRU is 
> broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4516) When a CachingStateStore is closed it should clear its associated NamedCache. Subsequent queries should throw InvalidStateStoreException

2016-12-09 Thread Damian Guy (JIRA)
Damian Guy created KAFKA-4516:
-

 Summary: When a CachingStateStore is closed it should clear its 
associated NamedCache. Subsequent queries should throw 
InvalidStateStoreException
 Key: KAFKA-4516
 URL: https://issues.apache.org/jira/browse/KAFKA-4516
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.1.0
Reporter: Damian Guy
Assignee: Damian Guy
 Fix For: 0.10.2.0


When close is called on a CachingStateStore we don't release the memory it is 
using in the Cache. This could result in the cache being full of data that is 
redundant. We also still allow queries on the CachedStateStore even though it 
has been closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Michael Pearce
Should the word transaction therefor be avoided here and another way to 
describe the feature found.

My worry is that by naming this transaction there instantly becomes a tie and 
expectation to traditional transaction properties.

From: Edoardo Comar 
Sent: Friday, December 9, 2016 3:54:43 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

Hi Michael,
I don't think KIP-98 should be considered a half-way house.
It would provide transaction with the level of durability that Kafka
provides.

Which is as Andrew suggests, different to other transactional systems.
BTW, I am convinced that Andrew has answered the 3rd of my previous 3
questions below, thanks.

Edo
--
Edoardo Comar
IBM MessageHub
eco...@uk.ibm.com
IBM UK Ltd, Hursley Park, SO21 2JN

IBM United Kingdom Limited Registered in England and Wales with number
741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6
3AU



From:   Michael André Pearce 
To: dev@kafka.apache.org
Date:   09/12/2016 12:40
Subject:Re: [DISCUSS] KIP-98: Exactly Once Delivery and
Transactional Messaging



Agreed, I think it is better to either support transactions properly and
fully to the level or expectation normally associated with a transactional
system. Or don't provide it at all.

Having a half way house can be more dangerous.

As I said earlier the issue of message duplication can be handled entirely
without transactions.

Sent from my iPhone

> On 9 Dec 2016, at 10:49, Andrew Schofield
 wrote:
>
> I've been pondering this question of coordinating other resource
managers with
> Kafka transactions for a while and I'm not convinced it's a good idea.
My
> reservations come down to the guarantees that it would provide in
failure
> scenarios.
>
> I don't think KIP-98 gives proper ACID semantics in the presence of all
> failures. For a transaction which contains a mixture of publishes and
offset
> updates, a bunch of topics are involved and it appears to me that an
> uncontrolled shutdown could result in some but not all of the lazy
writes
> making it to disk.
>
> Here are some of the failures that I'm worried about:
>
> * A message is published to a topic which crashes the leader Kafka node,
as
>  it's replicated across the cluster, it crashes all of the other Kafka
nodes
>  (we've really had this - SEGV, our fault and we've fixed it, but it
happened)
>  so this is a kind of rolling node crash in a cluster
> * Out of memory error in one or more Kafka nodes
> * Disk fills in one or more Kafka nodes
> * Uncontrolled power-off to all nodes in the cluster
>
> Does KIP-98 guarantee atomicity for transactions in all of these cases?
> Unless all of the topics involved in a transaction are recovered to the
> same point in time, you can't consider a transaction to be properly
atomic.
> If application code is designed expecting atomicity, there are going to
be
> tears. Perhaps only when disaster strikes, but the risk is there.
>
> I think KIP-98 is interesting, but I wouldn't equate what it provides
> with the transactions provided by resource managers with traditional
transaction
> logging. It's not better or worse, just different. If you tried to
migrate
> from a previous transactional system to Kafka transactions, I think
you'd
> better have procedures for reconciliation with the other resource
managers.
> Better still, don't build applications that are so fragile. The
principle
> of dumb pipes and smart endpoints is good in my view.
>
> If you're creating a global transaction containing two or more resource
> managers and using two-phase commit, it's very important that all of the
> resource managers maintain a coherent view of the sequence of events. If
any
> part fails due to a software or hardware failure, once the system is
> recovered, nothing must be forgotten. If you look at how presume-abort
> works, you'll see how important this is.
>
> Kafka doesn't really fit very nicely in this kind of environment because
of
> the way that it writes lazily to disk. The theory is that you must avoid
at all
> costs having an uncontrolled shutdown of an entire cluster because
you'll lose
> a little data at the end of the logs. So, if you are coordinating Kafka
and a
> relational database in a global transaction, it's theoretically possible
that
> a crashed Kafka would be a little forgetful while a crashed database
would not.
> The database would be an order of magnitude or more slower because of
the way
> its recovery logs are handled, but it would not be forgetful in the same
way.
>
> You get exactly the same kind of worries when you implement some kind of
> asynchronous replication for disaster recovery, even if all of the
resource
> managers force all of their log writes to disk eagerly. The replica at
the DR
> site is slightly behind the primary site, so if you have to recover from
an
> outage and switch to the

Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Edoardo Comar
Hi Michael,
I don't think KIP-98 should be considered a half-way house. 
It would provide transaction with the level of durability that Kafka 
provides.

Which is as Andrew suggests, different to other transactional systems.
BTW, I am convinced that Andrew has answered the 3rd of my previous 3 
questions below, thanks.

Edo
--
Edoardo Comar
IBM MessageHub
eco...@uk.ibm.com
IBM UK Ltd, Hursley Park, SO21 2JN

IBM United Kingdom Limited Registered in England and Wales with number 
741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 
3AU



From:   Michael André Pearce 
To: dev@kafka.apache.org
Date:   09/12/2016 12:40
Subject:Re: [DISCUSS] KIP-98: Exactly Once Delivery and 
Transactional Messaging



Agreed, I think it is better to either support transactions properly and 
fully to the level or expectation normally associated with a transactional 
system. Or don't provide it at all. 

Having a half way house can be more dangerous.

As I said earlier the issue of message duplication can be handled entirely 
without transactions.

Sent from my iPhone

> On 9 Dec 2016, at 10:49, Andrew Schofield 
 wrote:
> 
> I've been pondering this question of coordinating other resource 
managers with
> Kafka transactions for a while and I'm not convinced it's a good idea. 
My
> reservations come down to the guarantees that it would provide in 
failure
> scenarios.
> 
> I don't think KIP-98 gives proper ACID semantics in the presence of all
> failures. For a transaction which contains a mixture of publishes and 
offset
> updates, a bunch of topics are involved and it appears to me that an
> uncontrolled shutdown could result in some but not all of the lazy 
writes
> making it to disk.
> 
> Here are some of the failures that I'm worried about:
> 
> * A message is published to a topic which crashes the leader Kafka node, 
as
>  it's replicated across the cluster, it crashes all of the other Kafka 
nodes
>  (we've really had this - SEGV, our fault and we've fixed it, but it 
happened)
>  so this is a kind of rolling node crash in a cluster
> * Out of memory error in one or more Kafka nodes
> * Disk fills in one or more Kafka nodes
> * Uncontrolled power-off to all nodes in the cluster
> 
> Does KIP-98 guarantee atomicity for transactions in all of these cases?
> Unless all of the topics involved in a transaction are recovered to the
> same point in time, you can't consider a transaction to be properly 
atomic.
> If application code is designed expecting atomicity, there are going to 
be
> tears. Perhaps only when disaster strikes, but the risk is there.
> 
> I think KIP-98 is interesting, but I wouldn't equate what it provides
> with the transactions provided by resource managers with traditional 
transaction
> logging. It's not better or worse, just different. If you tried to 
migrate
> from a previous transactional system to Kafka transactions, I think 
you'd
> better have procedures for reconciliation with the other resource 
managers.
> Better still, don't build applications that are so fragile. The 
principle
> of dumb pipes and smart endpoints is good in my view.
> 
> If you're creating a global transaction containing two or more resource
> managers and using two-phase commit, it's very important that all of the
> resource managers maintain a coherent view of the sequence of events. If 
any
> part fails due to a software or hardware failure, once the system is
> recovered, nothing must be forgotten. If you look at how presume-abort
> works, you'll see how important this is.
> 
> Kafka doesn't really fit very nicely in this kind of environment because 
of
> the way that it writes lazily to disk. The theory is that you must avoid 
at all
> costs having an uncontrolled shutdown of an entire cluster because 
you'll lose
> a little data at the end of the logs. So, if you are coordinating Kafka 
and a
> relational database in a global transaction, it's theoretically possible 
that
> a crashed Kafka would be a little forgetful while a crashed database 
would not.
> The database would be an order of magnitude or more slower because of 
the way
> its recovery logs are handled, but it would not be forgetful in the same 
way.
> 
> You get exactly the same kind of worries when you implement some kind of
> asynchronous replication for disaster recovery, even if all of the 
resource
> managers force all of their log writes to disk eagerly. The replica at 
the DR
> site is slightly behind the primary site, so if you have to recover from 
an
> outage and switch to the DR site, it can be considered to be slightly 
forgetful
> about the last few moments before the outage. This is why a DR plan 
usually has
> some concept of compensation or reconciliation to make good any 
forgotten work.
> 
> In summary, I think Kafka would have to change in ways which would 
negate many
> of its good points in order to support XA transactions. It would be 
better to
> desig

[jira] [Commented] (KAFKA-3071) Kafka Server 0.8.2 ERROR OOME with siz

2016-12-09 Thread Anish Khanzode (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735600#comment-15735600
 ] 

Anish Khanzode commented on KAFKA-3071:
---

Is this an issue that needs attention. I see my consumer JVM sometime dies.

> Kafka Server 0.8.2 ERROR OOME with siz
> --
>
> Key: KAFKA-3071
> URL: https://issues.apache.org/jira/browse/KAFKA-3071
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2.0
> Environment: Linux * 2.6.32-431.23.3.el6.x86_64 #1 SMP Wed 
> Jul 16 06:12:23 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: vijay bhaskar
>  Labels: build
> Fix For: 0.8.2.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> [2016-01-06 12:34:18.819-0700] INFO Truncating log hughes-order-status-73 to 
> offset 18. (kafka.log.Log)
> [2016-01-06 12:34:18.819-0700] INFO Truncating log troubleshoot-completed-125 
> to offset 3. (kafka.log.Log)
> [2016-01-06 12:34:19.064-0700] DEBUG Scheduling task highwatermark-checkpoint 
> with initial delay 0 ms and period 5000 ms. (kafka.utils.KafkaScheduler)
> [2016-01-06 12:34:19.106-0700] DEBUG Scheduling task [__consumer_offsets,0] 
> with initial delay 0 ms and period -1 ms. (kafka.utils.KafkaScheduler)
> [2016-01-06 12:34:19.106-0700] INFO Loading offsets from 
> [__consumer_offsets,0] (kafka.server.OffsetManager)
> [2016-01-06 12:34:19.108-0700] INFO Finished loading offsets from 
> [__consumer_offsets,0] in 2 milliseconds. (kafka.server.OffsetManager)
> [2016-01-06 12:34:27.023-0700] ERROR OOME with size 743364196 
> (kafka.network.BoundedByteBufferReceive)
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
> at 
> kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:80)
> at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:63)
> at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
> at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
> at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108)
> at 
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:72)
> at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:113)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:113)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:113)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:112)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:112)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:112)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:111)
> at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:97)
> at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:89)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> [2016-01-06 12:34:32.003-0700] ERROR OOME with size 743364196 
> (kafka.network.BoundedByteBufferReceive)
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
> at 
> kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:80)
> at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:63)
> at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
> at 
> kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:29)
> at kafka.network.BlockingChannel.receive(BlockingChannel.scala:108)
> at 
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:80)
> at 
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:69)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:113)
> at 
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:113)
> at 

[jira] [Resolved] (KAFKA-4286) metric reporter may hit NullPointerException during shutdown

2016-12-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4286.

Resolution: Fixed
  Assignee: Xavier Léauté

This was fixed by 
https://github.com/apache/kafka/commit/006630fd93d8efb823e5b5f7d61584138df984a6,
 which avoids the NPE by checking for null instead of caching the value.

[~junrao], if you think that caching the value would be better, please reopen 
and we can do a follow up.

> metric reporter may hit NullPointerException during shutdown
> 
>
> Key: KAFKA-4286
> URL: https://issues.apache.org/jira/browse/KAFKA-4286
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Xavier Léauté
>
> When we shut down a broker, our metric reporter could throw the following 
> exception.
> java.lang.NullPointerException
>   at kafka.network.Processor$$anon$2.value(SocketServer.scala:392)
>   at kafka.network.Processor$$anon$2.value(SocketServer.scala:390)
> This is because we report Yammer metric like the following and we de-register 
> the underlying Kafka metric when shutting down the socket server.
>   newGauge("IdlePercent",
> new Gauge[Double] {
>   def value = {
> metrics.metrics().get(metrics.metricName("io-wait-ratio", 
> "socket-server-metrics", metricTags)).value()
>   }
> },
> metricTags.asScala
>   )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4286) metric reporter may hit NullPointerException during shutdown

2016-12-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4286:
---
Fix Version/s: 0.10.2.0

> metric reporter may hit NullPointerException during shutdown
> 
>
> Key: KAFKA-4286
> URL: https://issues.apache.org/jira/browse/KAFKA-4286
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>Reporter: Jun Rao
>Assignee: Xavier Léauté
> Fix For: 0.10.2.0
>
>
> When we shut down a broker, our metric reporter could throw the following 
> exception.
> java.lang.NullPointerException
>   at kafka.network.Processor$$anon$2.value(SocketServer.scala:392)
>   at kafka.network.Processor$$anon$2.value(SocketServer.scala:390)
> This is because we report Yammer metric like the following and we de-register 
> the underlying Kafka metric when shutting down the socket server.
>   newGauge("IdlePercent",
> new Gauge[Double] {
>   def value = {
> metrics.metrics().get(metrics.metricName("io-wait-ratio", 
> "socket-server-metrics", metricTags)).value()
>   }
> },
> metricTags.asScala
>   )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4431) HeartbeatThread should be a daemon thread

2016-12-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735393#comment-15735393
 ] 

ASF GitHub Bot commented on KAFKA-4431:
---

GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/2234

KAFKA-4431: Make consumer heartbeat thread a daemon thread



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4431

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2234


commit 08dbb26c94417872b37c10822fb21621f777dca9
Author: Rajini Sivaram 
Date:   2016-12-09T13:57:00Z

KAFKA-4431: Make consumer heartbeat thread a daemon thread




> HeartbeatThread should be a daemon thread
> -
>
> Key: KAFKA-4431
> URL: https://issues.apache.org/jira/browse/KAFKA-4431
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: David Judd
>Assignee: Rajini Sivaram
>
> We're seeing an issue where an exception inside the main processing loop of a 
> consumer doesn't cause the JVM to exit, as expected (and, in our case, 
> desired). From the thread dump, it appears that what's blocking exit is the 
> "kafka-coordinator-heartbeat-thread", which is not currently a daemon thread. 
> Per the mailing list, it sounds like this is a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2234: KAFKA-4431: Make consumer heartbeat thread a daemo...

2016-12-09 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/2234

KAFKA-4431: Make consumer heartbeat thread a daemon thread



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-4431

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2234


commit 08dbb26c94417872b37c10822fb21621f777dca9
Author: Rajini Sivaram 
Date:   2016-12-09T13:57:00Z

KAFKA-4431: Make consumer heartbeat thread a daemon thread




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (KAFKA-4431) HeartbeatThread should be a daemon thread

2016-12-09 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram reassigned KAFKA-4431:
-

Assignee: Rajini Sivaram

> HeartbeatThread should be a daemon thread
> -
>
> Key: KAFKA-4431
> URL: https://issues.apache.org/jira/browse/KAFKA-4431
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.1.0
>Reporter: David Judd
>Assignee: Rajini Sivaram
>
> We're seeing an issue where an exception inside the main processing loop of a 
> consumer doesn't cause the JVM to exit, as expected (and, in our case, 
> desired). From the thread dump, it appears that what's blocking exit is the 
> "kafka-coordinator-heartbeat-thread", which is not currently a daemon thread. 
> Per the mailing list, it sounds like this is a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3355) GetOffsetShell command doesn't work with SASL enabled Kafka

2016-12-09 Thread Mohammed amine GARMES (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735361#comment-15735361
 ] 

Mohammed amine GARMES commented on KAFKA-3355:
--

Hello [~singhashish], [~gwenshap],

I want to know if there is news in relation to the problem of GetOffsetShell ?
I fixed this issue for my company, I want to know if I can pufh this fix to be 
provided with  0.10.0.1 version ??!

And I want to fix create/delete topic commands, because we have some security 
issue regarding the PCI DSS audit. 


Best regards 

> GetOffsetShell command doesn't work with SASL enabled Kafka
> ---
>
> Key: KAFKA-3355
> URL: https://issues.apache.org/jira/browse/KAFKA-3355
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.9.0.1
> Environment: Kafka 0.9.0.1
>Reporter: TAO XIAO
>Assignee: Ashish K Singh
>
> I found that GetOffsetShell doesn't work with SASL enabled Kafka. I believe 
> this is due to old producer being used in GetOffsetShell.
> Kafka version 0.9.0.1
> Exception
> % bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 
> localhost:9092 --topic test --time -1
> [2016-03-04 21:43:56,597] INFO Verifying properties 
> (kafka.utils.VerifiableProperties)
> [2016-03-04 21:43:56,613] INFO Property client.id is overridden to 
> GetOffsetShell (kafka.utils.VerifiableProperties)
> [2016-03-04 21:43:56,613] INFO Property metadata.broker.list is overridden to 
> localhost:9092 (kafka.utils.VerifiableProperties)
> [2016-03-04 21:43:56,613] INFO Property request.timeout.ms is overridden to 
> 1000 (kafka.utils.VerifiableProperties)
> [2016-03-04 21:43:56,674] INFO Fetching metadata from broker 
> BrokerEndPoint(0,localhost,9092) with correlation id 0 for 1 topic(s) 
> Set(test) (kafka.client.ClientUtils$)
> [2016-03-04 21:43:56,689] INFO Connected to localhost:9092 for producing 
> (kafka.producer.SyncProducer)
> [2016-03-04 21:43:56,705] WARN Fetching topic metadata with correlation id 0 
> for topics [Set(test)] from broker [BrokerEndPoint(0,localhost,9092)] failed 
> (kafka.client.ClientUtils$)
> java.nio.BufferUnderflowException
>   at java.nio.Buffer.nextGetIndex(Buffer.java:498)
>   at java.nio.HeapByteBuffer.getShort(HeapByteBuffer.java:304)
>   at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:36)
>   at kafka.cluster.BrokerEndPoint$.readFrom(BrokerEndPoint.scala:52)
>   at 
> kafka.api.TopicMetadataResponse$$anonfun$1.apply(TopicMetadataResponse.scala:28)
>   at 
> kafka.api.TopicMetadataResponse$$anonfun$1.apply(TopicMetadataResponse.scala:28)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.Range.foreach(Range.scala:166)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> kafka.api.TopicMetadataResponse$.readFrom(TopicMetadataResponse.scala:28)
>   at kafka.producer.SyncProducer.send(SyncProducer.scala:120)
>   at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59)
>   at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94)
>   at kafka.tools.GetOffsetShell$.main(GetOffsetShell.scala:78)
>   at kafka.tools.GetOffsetShell.main(GetOffsetShell.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4515) Async producer send not retrying on TimeoutException: Batch Expired

2016-12-09 Thread Rajini Sivaram (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735355#comment-15735355
 ] 

Rajini Sivaram commented on KAFKA-4515:
---

This is being addressed in KIP-91: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer


> Async producer send not retrying on TimeoutException: Batch Expired
> ---
>
> Key: KAFKA-4515
> URL: https://issues.apache.org/jira/browse/KAFKA-4515
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.1
>Reporter: Di Shang
>
> We are testing out broker failure resiliency, we have a cluster of 3 brokers, 
> a topic with 5 partitions and 2 replicas. The replicas are evenly distributed 
> and there is at least a partition leader in every broker. We use this code to 
> continuously send msg and then kill one of the brokers to see if we lost any 
> msg. 
> {code:title=MyTest.java|borderStyle=solid}
> static volatile KafkaProducer producer;
> public static void send(ProducerRecord record) {
> producer.send(record, (metadata, exception) -> {
> if (exception != null) {
> // handle exception with manual retry
> System.out.println("Error, resending...");
> exception.printStackTrace();
> try {
> Thread.sleep(100);
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //send(record); // without this retry, msg would be lost
> } else if (metadata != null) {
> System.out.println("Sent " + record);
> } else {
> System.out.println("No exception and no metadata");
> }
> });
> }
> public static void main(String[] args) throws Exception {
> Properties props = new Properties();
> props.put("bootstrap.servers", "...");
> props.put("key.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("value.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("retries", "10");
> props.put("acks", "1");
> props.put("request.timeout.ms", "1000");
> producer = new KafkaProducer<>(props);
> Long i = 1L;
> while (true) {
> ProducerRecord record =
> new ProducerRecord<>("my-topic", i.toString());
> send(record);
> Thread.sleep(100);
> i++;
> }
> }
> {code}
> What we found is that when we set *request.timeout.ms* to a small value like 
> 1000, then when we kill a broker we would get a few TimeoutException: Batch 
> Expired errors in the send() callback. And if we don't handle this by 
> explicit retry like in the above code, then we would lose those msg. 
> The documentation for *request.timeout.ms* says:
> bq. The configuration controls the maximum amount of time the client will 
> wait for the response of a request. If the response is not received before 
> the timeout elapses the client will resend the request if necessary or fail 
> the request if retries are exhausted.
> This makes me think that a TimeoutException should be implicitly retried 
> using the *retries* options, which doesn't seem to work. 
> Strangely we also noticed that if *request.timeout.ms* is set long enough 
> like the default 3, then we don't lose any msg when killing a broker even 
> if we set *retries* to 0. 
> So it seems to me that the *retries* option is not working regarding to 
> broker down scenario. There seems to be some other internal mechanism for 
> handling broker failure and msg retry, and this mechanism won't work if there 
> is TimeoutException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-94: Session Windows

2016-12-09 Thread Damian Guy
Hi,

The vote for KIP-94 has now concluded.

The KIP has been accepted with:
3 binding votes (Ewen, Sriram, Guozhang) and 4 non-binding (Michael,
Matthias, Eno, Bill).

Thanks everyone.

Damian

On Wed, 7 Dec 2016 at 10:39 Michael Noll  wrote:

> +1 (non-binding)
>
> On Wed, Dec 7, 2016 at 1:21 AM, Sriram Subramanian 
> wrote:
>
> > +1 (binding)
> >
> > On Tue, Dec 6, 2016 at 3:43 PM, Ewen Cheslack-Postava  >
> > wrote:
> >
> > > +1 binding
> > >
> > > -Ewen
> > >
> > > On Tue, Dec 6, 2016 at 3:21 PM, Bill Bejeck  wrote:
> > >
> > > > +1
> > > >
> > > > On Tue, Dec 6, 2016 at 4:55 PM, Guozhang Wang 
> > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Tue, Dec 6, 2016 at 9:07 AM, Matthias J. Sax <
> > matth...@confluent.io
> > > >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On 12/6/16 7:40 AM, Eno Thereska wrote:
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Thanks
> > > > > > > Eno
> > > > > > >> On 6 Dec 2016, at 12:09, Damian Guy 
> > wrote:
> > > > > > >>
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I'd like to start the vote for KIP-94:
> > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 94+Session+Windows
> > > > > > >>
> > > > > > >> There is a PR for it here: https://github.com/apache/
> > > > kafka/pull/2166
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Damian
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> > >
> >
>


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Michael André Pearce
Agreed, I think it is better to either support transactions properly and fully 
to the level or expectation normally associated with a transactional system. Or 
don't provide it at all.  

Having a half way house can be more dangerous.

As I said earlier the issue of message duplication can be handled entirely 
without transactions.

Sent from my iPhone

> On 9 Dec 2016, at 10:49, Andrew Schofield  
> wrote:
> 
> I've been pondering this question of coordinating other resource managers with
> Kafka transactions for a while and I'm not convinced it's a good idea. My
> reservations come down to the guarantees that it would provide in failure
> scenarios.
> 
> I don't think KIP-98 gives proper ACID semantics in the presence of all
> failures. For a transaction which contains a mixture of publishes and offset
> updates, a bunch of topics are involved and it appears to me that an
> uncontrolled shutdown could result in some but not all of the lazy writes
> making it to disk.
> 
> Here are some of the failures that I'm worried about:
> 
> * A message is published to a topic which crashes the leader Kafka node, as
>  it's replicated across the cluster, it crashes all of the other Kafka nodes
>  (we've really had this - SEGV, our fault and we've fixed it, but it happened)
>  so this is a kind of rolling node crash in a cluster
> * Out of memory error in one or more Kafka nodes
> * Disk fills in one or more Kafka nodes
> * Uncontrolled power-off to all nodes in the cluster
> 
> Does KIP-98 guarantee atomicity for transactions in all of these cases?
> Unless all of the topics involved in a transaction are recovered to the
> same point in time, you can't consider a transaction to be properly atomic.
> If application code is designed expecting atomicity, there are going to be
> tears. Perhaps only when disaster strikes, but the risk is there.
> 
> I think KIP-98 is interesting, but I wouldn't equate what it provides
> with the transactions provided by resource managers with traditional 
> transaction
> logging. It's not better or worse, just different. If you tried to migrate
> from a previous transactional system to Kafka transactions, I think you'd
> better have procedures for reconciliation with the other resource managers.
> Better still, don't build applications that are so fragile. The principle
> of dumb pipes and smart endpoints is good in my view.
> 
> If you're creating a global transaction containing two or more resource
> managers and using two-phase commit, it's very important that all of the
> resource managers maintain a coherent view of the sequence of events. If any
> part fails due to a software or hardware failure, once the system is
> recovered, nothing must be forgotten. If you look at how presume-abort
> works, you'll see how important this is.
> 
> Kafka doesn't really fit very nicely in this kind of environment because of
> the way that it writes lazily to disk. The theory is that you must avoid at 
> all
> costs having an uncontrolled shutdown of an entire cluster because you'll lose
> a little data at the end of the logs. So, if you are coordinating Kafka and a
> relational database in a global transaction, it's theoretically possible that
> a crashed Kafka would be a little forgetful while a crashed database would 
> not.
> The database would be an order of magnitude or more slower because of the way
> its recovery logs are handled, but it would not be forgetful in the same way.
> 
> You get exactly the same kind of worries when you implement some kind of
> asynchronous replication for disaster recovery, even if all of the resource
> managers force all of their log writes to disk eagerly. The replica at the DR
> site is slightly behind the primary site, so if you have to recover from an
> outage and switch to the DR site, it can be considered to be slightly 
> forgetful
> about the last few moments before the outage. This is why a DR plan usually 
> has
> some concept of compensation or reconciliation to make good any forgotten 
> work.
> 
> In summary, I think Kafka would have to change in ways which would negate many
> of its good points in order to support XA transactions. It would be better to
> design applications to be resilient to message duplication and loss, rather
> than tightly coupling resource managers and ending up with something fragile.
> 
> Don't get me wrong. This is not an anti-Kafka rant. I just work with people
> used to traditional transactional systems, making use of Kafka for business
> applications, and it's important to understand the concepts on both sides
> and make sure your assumptions are valid.
> 
> Andrew Schofield
> IBM Watson and Cloud Platform
> 
> 
>> From: Michael Pearce 
>> Sent: 09 December 2016 06:19
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional 
>> Messaging
>> 
>> Usecase in IG:
>> 
>> Fund transfer between accounts. When we debit one account and fund another 
>> we must ensure the records 

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-09 Thread Damian Guy
Thanks for the update Michael.

I just wanted to add that there is one crucial piece of information that
i've failed to add (I apologise).

To me, the join between 2 Global Tables just produces a view on top of the
underlying tables (this is the same as it works for KTables today). So that
means there is no Physical StateStore that backs the join result, it is
just a Virtual StateStore that knows how to resolve the join when it is
required. I've deliberately taken this path so that we don't end up having
yet another copy of the data, stored on local disk, and sent to another
change-log topic. This also reduces the memory overhead from creating
RocksDBStores and reduces load on the Thread based caches we have. So it is
a resource optimization.

So while it is technically possible to support outer joins, we would need
to physically materialize the StateStore (and create a changelog-topic for
it), or, we'd need to provide another interface where the user could map
from the outerJoin key to both of the other table keys. This is because the
key of the outerJoin table could be either the key of the lhs table, or the
rhs tables, or something completely different.

With this and what you have mentioned above in mind i think we should park
outerJoin support for this KIP and re-visit if and when we need it in the
future.

I'll update the KIP with this.

Thanks,
Damian

On Fri, 9 Dec 2016 at 09:53 Michael Noll  wrote:

> Damian and I briefly chatted offline (thanks, Damian!), and here's the
> summary of my thoughts and conclusion.
>
> TL;DR: Let's skip outer join support for global tables.
>
> In more detail:
>
> - We agreed that, technically, we can add OUTER JOIN support.  However,
> outer joins only work if certain preconditions are met.  The preconditions
> are IMHO simliar/the same as we have for the normal, partitioned KTables
> (e.g. having matching keys and co-partitioned data for the tables), but in
> the case of global tables the user would need to meet all these
> preconditions in one big swing when specifying the params for the outer
> join call.  Even so, you'd only know at run-time whether the preconditions
> were actually met properly.
>
> - Hence it's quite likely that users will be confused about these
> preconditions and how to meet them, and -- from what we can tell -- use
> cases / user demand for outer joins have been rare.
>
> - So, long story short, even though we could add outer join support we'd
> suggest to skip it for global tables.  If we subsequently learn that is a
> lot of user interest in that functionality, we still have the option to add
> it in the future.
>
>
> Best,
> Michael
>
>
>
>
>
>
> On Thu, Dec 8, 2016 at 6:31 PM, Damian Guy  wrote:
>
> > Hi Michael,
> >
> > I don't see how that helps?
> >
> > Lets say we have tables Person(id, device_id, name, ...), Device(id,
> > person_id, type, ...), and both are keyed with same type. And we have a
> > stream, that for the sake of simplicity, has both person_id and
> device_id (
> > i know this is a bit contrived!)
> > so our join
> > person = builder.globalTable(...);
> > device = builder.globalTable(...);
> > personDevice = builder.outerJoin(device, ...);
> >
> > someStream = builder.stream(..);
> > // which id do i use to join with? person.id? device.id?
> > someStream.leftJoin(personDevice, ...)
> >
> > // Interactive Query on the view generated by the join of person and
> device
> > personDeviceStore = streams.store("personDevice",...);
> > // person.id? device.id?
> > personDeviceStore.get(someId);
> >
> > We get records
> > person id=1, device_id=2 ,...
> > device id=2, person_id=1, ...
> > stream person_id = 1, device_id = 2
> >
> > We could do the join between the GlobalTables both ways as each side
> could
> > map to the other sides key, but when i'm accessing the resulting table,
> > personDevice, what is the key? The person.id ? the device.id? it can't
> be
> > both of them.
> >
> > Thanks,
> > Damian
> >
> >
> >
> >
> > On Thu, 8 Dec 2016 at 15:51 Michael Noll  wrote:
> >
> > > The key type returned by both KeyValueMappers (in the current trunk
> > > version, that type is named `R`) would need to be the same for this to
> > > work.
> > >
> > >
> > > On Wed, Dec 7, 2016 at 4:46 PM, Damian Guy 
> wrote:
> > >
> > > > Michael,
> > > >
> > > > We can only support outerJoin if both tables are keyed the same way.
> > Lets
> > > > say for example you can map both ways, however, the key for each
> table
> > is
> > > > of a different type. So t1 is long and t2 is string - what is the key
> > > type
> > > > of the resulting GlobalKTable? So when you subsequently join to this
> > > table,
> > > > and do a lookup on it, which key are you using?
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Wed, 7 Dec 2016 at 14:31 Michael Noll 
> wrote:
> > > >
> > > > > Damian,
> > > > >
> > > > > yes, that makes sense.
> > > > >
> > > > > But I am still wondering:  In your example, there's no prior
> > knowledge
> > > > "can
> > > > > I m

[jira] [Commented] (KAFKA-4463) Setup travis-ci integration for ducktape tests

2016-12-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735136#comment-15735136
 ] 

Ismael Juma commented on KAFKA-4463:


When this is done, we should update the following wiki page (search for 
KAFKA-4463): 
https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes

> Setup travis-ci integration for ducktape tests
> --
>
> Key: KAFKA-4463
> URL: https://issues.apache.org/jira/browse/KAFKA-4463
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Affects Versions: 0.10.0.1
>Reporter: Raghav Kumar Gautam
>Assignee: Raghav Kumar Gautam
> Fix For: 0.10.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3715) Higher granularity streams metrics

2016-12-09 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-3715:

Priority: Major  (was: Minor)

> Higher granularity streams metrics 
> ---
>
> Key: KAFKA-3715
> URL: https://issues.apache.org/jira/browse/KAFKA-3715
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Jeff Klukas
>Assignee: aarti gupta
>  Labels: api
> Fix For: 0.10.2.0
>
>
> Originally proposed by [~guozhang] in 
> https://github.com/apache/kafka/pull/1362#issuecomment-218326690
> We can consider adding metrics for process / punctuate / commit rate at the 
> granularity of each processor node in addition to the global rate mentioned 
> above. This is very helpful in debugging.
> We can consider adding rate / total cumulated metrics for context.forward 
> indicating how many records were forwarded downstream from this processor 
> node as well. This is helpful in debugging.
> We can consider adding metrics for each stream partition's timestamp. This is 
> helpful in debugging.
> Besides the latency metrics, we can also add throughput latency in terms of 
> source records consumed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional Messaging

2016-12-09 Thread Andrew Schofield
I've been pondering this question of coordinating other resource managers with
Kafka transactions for a while and I'm not convinced it's a good idea. My
reservations come down to the guarantees that it would provide in failure
scenarios.

I don't think KIP-98 gives proper ACID semantics in the presence of all
failures. For a transaction which contains a mixture of publishes and offset
updates, a bunch of topics are involved and it appears to me that an
uncontrolled shutdown could result in some but not all of the lazy writes
making it to disk.

Here are some of the failures that I'm worried about:

* A message is published to a topic which crashes the leader Kafka node, as
  it's replicated across the cluster, it crashes all of the other Kafka nodes
  (we've really had this - SEGV, our fault and we've fixed it, but it happened)
  so this is a kind of rolling node crash in a cluster
* Out of memory error in one or more Kafka nodes
* Disk fills in one or more Kafka nodes
* Uncontrolled power-off to all nodes in the cluster

Does KIP-98 guarantee atomicity for transactions in all of these cases?
Unless all of the topics involved in a transaction are recovered to the
same point in time, you can't consider a transaction to be properly atomic.
If application code is designed expecting atomicity, there are going to be
tears. Perhaps only when disaster strikes, but the risk is there.

I think KIP-98 is interesting, but I wouldn't equate what it provides
with the transactions provided by resource managers with traditional transaction
logging. It's not better or worse, just different. If you tried to migrate
from a previous transactional system to Kafka transactions, I think you'd
better have procedures for reconciliation with the other resource managers.
Better still, don't build applications that are so fragile. The principle
of dumb pipes and smart endpoints is good in my view.

If you're creating a global transaction containing two or more resource
managers and using two-phase commit, it's very important that all of the
resource managers maintain a coherent view of the sequence of events. If any
part fails due to a software or hardware failure, once the system is
recovered, nothing must be forgotten. If you look at how presume-abort
works, you'll see how important this is.

Kafka doesn't really fit very nicely in this kind of environment because of
the way that it writes lazily to disk. The theory is that you must avoid at all
costs having an uncontrolled shutdown of an entire cluster because you'll lose
a little data at the end of the logs. So, if you are coordinating Kafka and a
relational database in a global transaction, it's theoretically possible that
a crashed Kafka would be a little forgetful while a crashed database would not.
The database would be an order of magnitude or more slower because of the way
its recovery logs are handled, but it would not be forgetful in the same way.

You get exactly the same kind of worries when you implement some kind of
asynchronous replication for disaster recovery, even if all of the resource
managers force all of their log writes to disk eagerly. The replica at the DR
site is slightly behind the primary site, so if you have to recover from an
outage and switch to the DR site, it can be considered to be slightly forgetful
about the last few moments before the outage. This is why a DR plan usually has
some concept of compensation or reconciliation to make good any forgotten work.

In summary, I think Kafka would have to change in ways which would negate many
of its good points in order to support XA transactions. It would be better to
design applications to be resilient to message duplication and loss, rather
than tightly coupling resource managers and ending up with something fragile.

Don't get me wrong. This is not an anti-Kafka rant. I just work with people
used to traditional transactional systems, making use of Kafka for business
applications, and it's important to understand the concepts on both sides
and make sure your assumptions are valid.

Andrew Schofield
IBM Watson and Cloud Platform


> From: Michael Pearce 
> Sent: 09 December 2016 06:19
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional 
> Messaging
>     
> Usecase in IG:
> 
> Fund transfer between accounts. When we debit one account and fund another we 
> must ensure the records to both occur > as an acid action, and as a single 
> transaction.
> 
> Today we achieve this because we have jms, as such we can do the actions 
> needed in an xa transaction across both the > accounts. To move this flow to 
> Kafka we would need support of XA transaction.
> 
> 
> 
> Sent using OWA for iPhone
> 
> From: Michael Pearce 
> Sent: Friday, December 9, 2016 6:09:06 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-98: Exactly Once Delivery and Transactional 
> Messaging
> 
> Hi Jay,
> 
> For me having an XA transaction 

[jira] [Commented] (KAFKA-4505) Cannot get topic lag since kafka upgrade from 0.8.1.0 to 0.10.1.0

2016-12-09 Thread Romaric Parmentier (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734892#comment-15734892
 ] 

Romaric Parmentier commented on KAFKA-4505:
---

I'm not sure, although we are using the new driver 0.10.0.0 we are still using 
almost the same code as we were using with the driver 0.8.1.0. So I think we 
are using the zk consumer.
We have two ways to consume a topic, the legacy high level consumer and a home 
made consumer. I verified and it doesn't seem to be related to the consumer we 
are using: the lag cannot be retrieve for both type of consumers.

Is there a way to repair the situation (maybe by creating owners manually), I 
really need to monitor the lag.

Thank you for your answers

> Cannot get topic lag since kafka upgrade from 0.8.1.0 to 0.10.1.0
> -
>
> Key: KAFKA-4505
> URL: https://issues.apache.org/jira/browse/KAFKA-4505
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, metrics, offset manager
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>
> We were using kafka 0.8.1.1 and we just migrate to version 0.10.1.0.
> Since we migrate we are using the new script kafka-consumer-groups.sh to 
> retreive topic lags but it don't seem to work anymore. 
> Because the application is using the 0.8 driver we have added the following 
> conf to each kafka servers:
> log.message.format.version=0.8.2
> inter.broker.protocol.version=0.10.0.0
> When I'm using the option --list with kafka-consumer-groups.sh I can see 
> every consumer groups I'm using but the --describe is not working:
> /usr/share/kafka$ bin/kafka-consumer-groups.sh --zookeeper ip:2181 --describe 
> --group group_name
> No topic available for consumer group provided
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> When I'm looking into zookeeper I can see the offset increasing for this 
> consumer group.
> Any idea ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-09 Thread Michael Noll
Damian and I briefly chatted offline (thanks, Damian!), and here's the
summary of my thoughts and conclusion.

TL;DR: Let's skip outer join support for global tables.

In more detail:

- We agreed that, technically, we can add OUTER JOIN support.  However,
outer joins only work if certain preconditions are met.  The preconditions
are IMHO simliar/the same as we have for the normal, partitioned KTables
(e.g. having matching keys and co-partitioned data for the tables), but in
the case of global tables the user would need to meet all these
preconditions in one big swing when specifying the params for the outer
join call.  Even so, you'd only know at run-time whether the preconditions
were actually met properly.

- Hence it's quite likely that users will be confused about these
preconditions and how to meet them, and -- from what we can tell -- use
cases / user demand for outer joins have been rare.

- So, long story short, even though we could add outer join support we'd
suggest to skip it for global tables.  If we subsequently learn that is a
lot of user interest in that functionality, we still have the option to add
it in the future.


Best,
Michael






On Thu, Dec 8, 2016 at 6:31 PM, Damian Guy  wrote:

> Hi Michael,
>
> I don't see how that helps?
>
> Lets say we have tables Person(id, device_id, name, ...), Device(id,
> person_id, type, ...), and both are keyed with same type. And we have a
> stream, that for the sake of simplicity, has both person_id and device_id (
> i know this is a bit contrived!)
> so our join
> person = builder.globalTable(...);
> device = builder.globalTable(...);
> personDevice = builder.outerJoin(device, ...);
>
> someStream = builder.stream(..);
> // which id do i use to join with? person.id? device.id?
> someStream.leftJoin(personDevice, ...)
>
> // Interactive Query on the view generated by the join of person and device
> personDeviceStore = streams.store("personDevice",...);
> // person.id? device.id?
> personDeviceStore.get(someId);
>
> We get records
> person id=1, device_id=2 ,...
> device id=2, person_id=1, ...
> stream person_id = 1, device_id = 2
>
> We could do the join between the GlobalTables both ways as each side could
> map to the other sides key, but when i'm accessing the resulting table,
> personDevice, what is the key? The person.id ? the device.id? it can't be
> both of them.
>
> Thanks,
> Damian
>
>
>
>
> On Thu, 8 Dec 2016 at 15:51 Michael Noll  wrote:
>
> > The key type returned by both KeyValueMappers (in the current trunk
> > version, that type is named `R`) would need to be the same for this to
> > work.
> >
> >
> > On Wed, Dec 7, 2016 at 4:46 PM, Damian Guy  wrote:
> >
> > > Michael,
> > >
> > > We can only support outerJoin if both tables are keyed the same way.
> Lets
> > > say for example you can map both ways, however, the key for each table
> is
> > > of a different type. So t1 is long and t2 is string - what is the key
> > type
> > > of the resulting GlobalKTable? So when you subsequently join to this
> > table,
> > > and do a lookup on it, which key are you using?
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Wed, 7 Dec 2016 at 14:31 Michael Noll  wrote:
> > >
> > > > Damian,
> > > >
> > > > yes, that makes sense.
> > > >
> > > > But I am still wondering:  In your example, there's no prior
> knowledge
> > > "can
> > > > I map from t1->t2" that Streams can leverage for joining t1 and t2
> > other
> > > > than blindly relying on the user to provide an appropriate
> > KeyValueMapper
> > > > for K1/V1 of t1 -> K2/V2 of t2.  In other words, if we allow the user
> > to
> > > > provide a KeyValueMapper from t1->t2 (Streams does not know at
> compile
> > > time
> > > > whether this mapping will actually work), then we can also allow the
> > user
> > > > to provide a corresponding "reverse" mapper from t2->t1.  That is, we
> > > could
> > > > say that an outer join between two global tables IS supported, but if
> > and
> > > > only if the user provides two KeyValueMappers, one for t1->t2 and one
> > for
> > > > t2->t1.
> > > >
> > > > The left join t1->t2 (which is supported in the KIP), in general,
> works
> > > > only because of the existence of the user-provided KeyValueMapper
> from
> > > > t1->t2.  The outer join, as you point out, cannot satisfied as easily
> > > > because Streams must know two mappers, t1->t2 plus t2->t1 --
> otherwise
> > > the
> > > > outer join won't work.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Dec 7, 2016 at 3:04 PM, Damian Guy 
> > wrote:
> > > >
> > > > > Hi Michael,
> > > > >
> > > > > Sure. Say we have 2 input topics t1 & t2 below:
> > > > > t1{
> > > > >  int key;
> > > > >  string t2_id;
> > > > >  ...
> > > > > }
> > > > >
> > > > > t2 {
> > > > >   string key;
> > > > >   ..
> > > > > }
> > > > > If we create global tables out of these we'd get:
> > > > > GlobalKTable t1;
> > > > > GlobalKTable t2;
> > > > >
> > > > > So the join can only go in 1 direction, i.e, from t1 -> t2 as in
> > order
> > 

[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-09 Thread Jan Omar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734770#comment-15734770
 ] 

Jan Omar commented on KAFKA-4477:
-

Nothing interesting there in our case. 

Looked at 30 minutes before and after the incident. Within that timeframe the 
worst numbers reported are (3 Node Cluster, the lowest reported value):

- NetworkProcessorAvgIdlePercent 0.96 
- RequestHandlerAvgIdlePercent 0.74

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Assignee: Apurva Mehta
>Priority: Critical
>  Labels: reliability
> Attachments: kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4515) Async producer send not retrying on TimeoutException: Batch Expired

2016-12-09 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734702#comment-15734702
 ] 

huxi commented on KAFKA-4515:
-

Did you see lots of "Got error produce response with correlation id {} on 
topic-partition {}, retrying ({} attempts left). Error" items in the producer 
log when enabling retry?

> Async producer send not retrying on TimeoutException: Batch Expired
> ---
>
> Key: KAFKA-4515
> URL: https://issues.apache.org/jira/browse/KAFKA-4515
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.1
>Reporter: Di Shang
>
> We are testing out broker failure resiliency, we have a cluster of 3 brokers, 
> a topic with 5 partitions and 2 replicas. The replicas are evenly distributed 
> and there is at least a partition leader in every broker. We use this code to 
> continuously send msg and then kill one of the brokers to see if we lost any 
> msg. 
> {code:title=MyTest.java|borderStyle=solid}
> static volatile KafkaProducer producer;
> public static void send(ProducerRecord record) {
> producer.send(record, (metadata, exception) -> {
> if (exception != null) {
> // handle exception with manual retry
> System.out.println("Error, resending...");
> exception.printStackTrace();
> try {
> Thread.sleep(100);
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //send(record); // without this retry, msg would be lost
> } else if (metadata != null) {
> System.out.println("Sent " + record);
> } else {
> System.out.println("No exception and no metadata");
> }
> });
> }
> public static void main(String[] args) throws Exception {
> Properties props = new Properties();
> props.put("bootstrap.servers", "...");
> props.put("key.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("value.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put("retries", "10");
> props.put("acks", "1");
> props.put("request.timeout.ms", "1000");
> producer = new KafkaProducer<>(props);
> Long i = 1L;
> while (true) {
> ProducerRecord record =
> new ProducerRecord<>("my-topic", i.toString());
> send(record);
> Thread.sleep(100);
> i++;
> }
> }
> {code}
> What we found is that when we set *request.timeout.ms* to a small value like 
> 1000, then when we kill a broker we would get a few TimeoutException: Batch 
> Expired errors in the send() callback. And if we don't handle this by 
> explicit retry like in the above code, then we would lose those msg. 
> The documentation for *request.timeout.ms* says:
> bq. The configuration controls the maximum amount of time the client will 
> wait for the response of a request. If the response is not received before 
> the timeout elapses the client will resend the request if necessary or fail 
> the request if retries are exhausted.
> This makes me think that a TimeoutException should be implicitly retried 
> using the *retries* options, which doesn't seem to work. 
> Strangely we also noticed that if *request.timeout.ms* is set long enough 
> like the default 3, then we don't lose any msg when killing a broker even 
> if we set *retries* to 0. 
> So it seems to me that the *retries* option is not working regarding to 
> broker down scenario. There seems to be some other internal mechanism for 
> handling broker failure and msg retry, and this mechanism won't work if there 
> is TimeoutException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4497) log cleaner breaks on timeindex

2016-12-09 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734657#comment-15734657
 ] 

huxi commented on KAFKA-4497:
-

A safe but naive solution is to reject this invalid index entry and captures 
InvalidOffsetException and IllegalStateException in order to not halt the 
cleaner thread. What do you think? [~becket_qin]

> log cleaner breaks on timeindex
> ---
>
> Key: KAFKA-4497
> URL: https://issues.apache.org/jira/browse/KAFKA-4497
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.10.1.0
> Environment: Debian Jessie, Oracle Java 8u92, kafka_2.11-0.10.1.0
>Reporter: Robert Schumann
>
> _created from ML entry by request of [~ijuma]_
> Hi all,
> we are facing an issue with latest kafka 0.10.1 and the log cleaner thread 
> with regards to the timeindex files. From the log of the log-cleaner we see 
> after startup that it tries to cleanup a topic called xdc_listing-status-v2 
> [1]. The topic is setup with log compaction [2] and the kafka cluster 
> configuration has log.cleaner enabled [3]. Looking at the log and the newly 
> created file [4], the cleaner seems to refer to tombstones prior to 
> epoch_time=0 - maybe because he finds messages, which don’t have a timestamp 
> at all (?). All producers and consumers are using 0.10.1 and the topics have 
> been created completely new, so I’m not sure, where this issue would come 
> from. The original timeindex file [5] seems to show only valid timestamps for 
> the mentioned offsets. I would also like to mention that the issue happened 
> in two independent datacenters at the same time, so I would rather expect an 
> application/producer issue instead of random disk failures. We didn’t have 
> the problem with 0.10.0 for around half a year, it appeared shortly after the 
> upgrade to 0.10.1.
> The puzzling message from the cleaner “cleaning prior to Fri Dec 02 16:35:50 
> CET 2016, discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970” also 
> confuses me a bit. Does that mean, it does not find any log segments which 
> can be cleaned up or the last timestamp of the last log segment is somehow 
> broken/missing?
> I’m also a bit wondering, why the log cleaner thread stops completely after 
> an error with one topic. I would at least expect that it keeps on cleaning up 
> other topics, but apparently it doesn’t do that, e.g. it’s not even cleaning 
> the __consumer_offsets anymore.
> Does anybody have the same issues or can explain, what’s going on? Thanks for 
> any help or suggestions.
> Cheers
> Robert
> [1]
> {noformat}
> [2016-12-06 12:49:17,885] INFO Starting the log cleaner (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,895] INFO [kafka-log-cleaner-thread-0], Starting  
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,947] INFO Cleaner 0: Beginning cleaning of log 
> xdc_listing-status-v2-1. (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,948] INFO Cleaner 0: Building offset map for 
> xdc_listing-status-v2-1... (kafka.log.LogCleaner)
> [2016-12-06 12:49:17,989] INFO Cleaner 0: Building offset map for log 
> xdc_listing-status-v2-1 for 1 segments in offset range [0, 194991). 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,572] INFO Cleaner 0: Offset map for log 
> xdc_listing-status-v2-1 complete. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,577] INFO Cleaner 0: Cleaning log 
> xdc_listing-status-v2-1 (cleaning prior to Fri Dec 02 16:35:50 CET 2016, 
> discarding tombstones prior to Thu Jan 01 01:00:00 CET 1970)... 
> (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,580] INFO Cleaner 0: Cleaning segment 0 in log 
> xdc_listing-status-v2-1 (largest timestamp Fri Dec 02 16:35:50 CET 2016) into 
> 0, retaining deletes. (kafka.log.LogCleaner)
> [2016-12-06 12:49:24,968] ERROR [kafka-log-cleaner-thread-0], Error due to  
> (kafka.log.LogCleaner)
> kafka.common.InvalidOffsetException: Attempt to append an offset (-1) to slot 
> 9 no larger than the last offset appended (11832) to 
> /var/lib/kafka/xdc_listing-status-v2-1/.timeindex.cleaned.
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply$mcV$sp(TimeIndex.scala:117)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at 
> kafka.log.TimeIndex$$anonfun$maybeAppend$1.apply(TimeIndex.scala:107)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
> at kafka.log.TimeIndex.maybeAppend(TimeIndex.scala:107)
> at kafka.log.LogSegment.append(LogSegment.scala:106)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:518)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:404)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:400)
> at scala.collection.immutable.List.foreach(List.scala:381)