Re: [ANNOUNCE] New committer: Rajini Sivaram

2017-04-24 Thread Apurva Mehta
Congratulations Rajini!

On Mon, Apr 24, 2017 at 2:06 PM, Gwen Shapira  wrote:

> The PMC for Apache Kafka has invited Rajini Sivaram as a committer and we
> are pleased to announce that she has accepted!
>
> Rajini contributed 83 patches, 8 KIPs (all security and quota
> improvements) and a significant number of reviews. She is also on the
> conference committee for Kafka Summit, where she helped select content
> for our community event. Through her contributions she's shown good
> judgement, good coding skills, willingness to work with the community on
> finding the best
> solutions and very consistent follow through on her work.
>
> Thank you for your contributions, Rajini! Looking forward to many more :)
>
> Gwen, for the Apache Kafka PMC
>


Re: NPE on startup with a low-level API based application

2017-06-15 Thread Apurva Mehta
Hi Frank,

What is is the value of `batch.size` in your producer? What is the size of
the key and value you are trying to write?

Thanks,
Apurva

On Thu, Jun 15, 2017 at 2:28 AM, Frank Lyaruu  wrote:

> Hey people, I see an error I haven't seen before. It is on a lowlevel-API
> based streams application. I've started it once, then it ran fine, then did
> a graceful shutdown and since then I always see this error on startup.
>
> I'm using yesterday's trunk.
>
> It seems that the MemoryRecordsBuilder overflows somehow, is there
> something I need to configure?
>
> java.lang.NullPointerException
>
> at org.apache.kafka.common.utils.Utils.notNull(Utils.java:243)
> at
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(
> RecordAccumulator.java:219)
> at
> org.apache.kafka.clients.producer.KafkaProducer.doSend(
> KafkaProducer.java:650)
> at
> org.apache.kafka.clients.producer.KafkaProducer.send(
> KafkaProducer.java:604)
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(
> RecordCollectorImpl.java:97)
> at
> org.apache.kafka.streams.state.internals.StoreChangeLogger.logChange(
> StoreChangeLogger.java:59)
> at
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueBytesStor
> e.put(ChangeLoggingKeyValueBytesStore.java:58)
> at
> org.apache.kafka.streams.state.internals.ChangeLoggingKeyValueStore.put(
> ChangeLoggingKeyValueStore.java:73)
> at
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore$2.run(
> MeteredKeyValueStore.java:66)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
> measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.put(
> MeteredKeyValueStore.java:149)
> at
> com.dexels.kafka.streams.remotejoin.StoreProcessor.
> process(StoreProcessor.java:47)
> at
> com.dexels.kafka.streams.remotejoin.StoreProcessor.
> process(StoreProcessor.java:1)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(
> ProcessorNode.java:47)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
> measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> ProcessorNode.java:133)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(
> ProcessorContextImpl.java:82)
> at
> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedProcessor.
> emitMessage(OneToManyGroupedProcessor.java:95)
> at
> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedProcessor.
> process(OneToManyGroupedProcessor.java:80)
> at
> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedProcessor.
> process(OneToManyGroupedProcessor.java:1)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(
> ProcessorNode.java:47)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
> measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> ProcessorNode.java:133)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(
> ProcessorContextImpl.java:82)
> at
> com.dexels.kafka.streams.remotejoin.StoreProcessor.
> process(StoreProcessor.java:48)
> at
> com.dexels.kafka.streams.remotejoin.StoreProcessor.
> process(StoreProcessor.java:1)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(
> ProcessorNode.java:47)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
> measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> ProcessorNode.java:133)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(
> ProcessorContextImpl.java:82)
> at
> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedProcessor.
> emitMessage(OneToManyGroupedProcessor.java:95)
> at
> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedProcessor.
> process(OneToManyGroupedProcessor.java:80)
> at
> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedProcessor.
> process(OneToManyGroupedProcessor.java:1)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(
> ProcessorNode.java:47)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
> measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> ProcessorNode.java:133)
> at
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(
> ProcessorContextImpl.java:82)
> at
> com.dexels.kafka.streams.remotejoin.StoreProcessor.
> process(StoreProcessor.java:48)
> at
> com.dexels.kafka.streams.remotejoin.StoreProcessor.
> process(StoreProcessor.java:1)
> at
> org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(
> ProcessorNode.java:47)
> at
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.
> measureLatencyNs(StreamsMetricsImpl.java:187)
> at
> org

Re: NPE on startup with a low-level API based application

2017-06-15 Thread Apurva Mehta
Frank: it would be even better if you could share the key and value which
was causing this problem. Maybe share it on the JIRA:
https://issues.apache.org/jira/browse/KAFKA-5456 ?

Thanks,
Apurva

On Thu, Jun 15, 2017 at 4:07 PM, Apurva Mehta  wrote:

> Hi Frank,
>
> What is is the value of `batch.size` in your producer? What is the size of
> the key and value you are trying to write?
>
> Thanks,
> Apurva
>
> On Thu, Jun 15, 2017 at 2:28 AM, Frank Lyaruu  wrote:
>
>> Hey people, I see an error I haven't seen before. It is on a lowlevel-API
>> based streams application. I've started it once, then it ran fine, then
>> did
>> a graceful shutdown and since then I always see this error on startup.
>>
>> I'm using yesterday's trunk.
>>
>> It seems that the MemoryRecordsBuilder overflows somehow, is there
>> something I need to configure?
>>
>> java.lang.NullPointerException
>>
>> at org.apache.kafka.common.utils.Utils.notNull(Utils.java:243)
>> at
>> org.apache.kafka.clients.producer.internals.RecordAccumulato
>> r.append(RecordAccumulator.java:219)
>> at
>> org.apache.kafka.clients.producer.KafkaProducer.doSend(Kafka
>> Producer.java:650)
>> at
>> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaPr
>> oducer.java:604)
>> at
>> org.apache.kafka.streams.processor.internals.RecordCollector
>> Impl.send(RecordCollectorImpl.java:97)
>> at
>> org.apache.kafka.streams.state.internals.StoreChangeLogger.
>> logChange(StoreChangeLogger.java:59)
>> at
>> org.apache.kafka.streams.state.internals.ChangeLoggingKeyVal
>> ueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:58)
>> at
>> org.apache.kafka.streams.state.internals.ChangeLoggingKeyVal
>> ueStore.put(ChangeLoggingKeyValueStore.java:73)
>> at
>> org.apache.kafka.streams.state.internals.MeteredKeyValueStor
>> e$2.run(MeteredKeyValueStore.java:66)
>> at
>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>> at
>> org.apache.kafka.streams.state.internals.MeteredKeyValueStor
>> e.put(MeteredKeyValueStore.java:149)
>> at
>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(
>> StoreProcessor.java:47)
>> at
>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(
>> StoreProcessor.java:1)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode$
>> 1.run(ProcessorNode.java:47)
>> at
>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode.
>> process(ProcessorNode.java:133)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorContex
>> tImpl.forward(ProcessorContextImpl.java:82)
>> at
>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>> rocessor.emitMessage(OneToManyGroupedProcessor.java:95)
>> at
>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>> rocessor.process(OneToManyGroupedProcessor.java:80)
>> at
>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>> rocessor.process(OneToManyGroupedProcessor.java:1)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode$
>> 1.run(ProcessorNode.java:47)
>> at
>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode.
>> process(ProcessorNode.java:133)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorContex
>> tImpl.forward(ProcessorContextImpl.java:82)
>> at
>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(
>> StoreProcessor.java:48)
>> at
>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(
>> StoreProcessor.java:1)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode$
>> 1.run(ProcessorNode.java:47)
>> at
>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorNode.
>> process(ProcessorNode.java:133)
>> at
>> org.apache.kafka.streams.processor.internals.ProcessorContex
>> tImpl.forward(ProcessorContextImpl.java:82)
>> at
>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>> rocessor.emitMessage(OneToManyGroupedProcessor.java:95)
>> at
>> com.dexels

Re: NPE on startup with a low-level API based application

2017-06-15 Thread Apurva Mehta
Finally, was compression enabled when you hit this exception? If so, which
compression algorithm was enabled?

On Thu, Jun 15, 2017 at 5:04 PM, Apurva Mehta  wrote:

> Frank: it would be even better if you could share the key and value which
> was causing this problem. Maybe share it on the JIRA:
> https://issues.apache.org/jira/browse/KAFKA-5456 ?
>
> Thanks,
> Apurva
>
> On Thu, Jun 15, 2017 at 4:07 PM, Apurva Mehta  wrote:
>
>> Hi Frank,
>>
>> What is is the value of `batch.size` in your producer? What is the size
>> of the key and value you are trying to write?
>>
>> Thanks,
>> Apurva
>>
>> On Thu, Jun 15, 2017 at 2:28 AM, Frank Lyaruu  wrote:
>>
>>> Hey people, I see an error I haven't seen before. It is on a lowlevel-API
>>> based streams application. I've started it once, then it ran fine, then
>>> did
>>> a graceful shutdown and since then I always see this error on startup.
>>>
>>> I'm using yesterday's trunk.
>>>
>>> It seems that the MemoryRecordsBuilder overflows somehow, is there
>>> something I need to configure?
>>>
>>> java.lang.NullPointerException
>>>
>>> at org.apache.kafka.common.utils.Utils.notNull(Utils.java:243)
>>> at
>>> org.apache.kafka.clients.producer.internals.RecordAccumulato
>>> r.append(RecordAccumulator.java:219)
>>> at
>>> org.apache.kafka.clients.producer.KafkaProducer.doSend(Kafka
>>> Producer.java:650)
>>> at
>>> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaPr
>>> oducer.java:604)
>>> at
>>> org.apache.kafka.streams.processor.internals.RecordCollector
>>> Impl.send(RecordCollectorImpl.java:97)
>>> at
>>> org.apache.kafka.streams.state.internals.StoreChangeLogger.l
>>> ogChange(StoreChangeLogger.java:59)
>>> at
>>> org.apache.kafka.streams.state.internals.ChangeLoggingKeyVal
>>> ueBytesStore.put(ChangeLoggingKeyValueBytesStore.java:58)
>>> at
>>> org.apache.kafka.streams.state.internals.ChangeLoggingKeyVal
>>> ueStore.put(ChangeLoggingKeyValueStore.java:73)
>>> at
>>> org.apache.kafka.streams.state.internals.MeteredKeyValueStor
>>> e$2.run(MeteredKeyValueStore.java:66)
>>> at
>>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>>> at
>>> org.apache.kafka.streams.state.internals.MeteredKeyValueStor
>>> e.put(MeteredKeyValueStore.java:149)
>>> at
>>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(S
>>> toreProcessor.java:47)
>>> at
>>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(S
>>> toreProcessor.java:1)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorNode$1
>>> .run(ProcessorNode.java:47)
>>> at
>>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorNode.p
>>> rocess(ProcessorNode.java:133)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorContex
>>> tImpl.forward(ProcessorContextImpl.java:82)
>>> at
>>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>>> rocessor.emitMessage(OneToManyGroupedProcessor.java:95)
>>> at
>>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>>> rocessor.process(OneToManyGroupedProcessor.java:80)
>>> at
>>> com.dexels.kafka.streams.remotejoin.ranged.OneToManyGroupedP
>>> rocessor.process(OneToManyGroupedProcessor.java:1)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorNode$1
>>> .run(ProcessorNode.java:47)
>>> at
>>> org.apache.kafka.streams.processor.internals.StreamsMetricsI
>>> mpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorNode.p
>>> rocess(ProcessorNode.java:133)
>>> at
>>> org.apache.kafka.streams.processor.internals.ProcessorContex
>>> tImpl.forward(ProcessorContextImpl.java:82)
>>> at
>>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(S
>>> toreProcessor.java:48)
>>> at
>>> com.dexels.kafka.streams.remotejoin.StoreProcessor.process(S
>>> toreProcessor.java:1)
>>> at
>>> org.apache.kafka.streams.p

Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Apurva Mehta
Hi Tom,

I actually made modifications to the produce performance tool to do real
transactions earlier this week as part of our benchmarking (results
published here: bit.ly/kafka-eos-perf). I just submitted that patch here:
https://github.com/apache/kafka/pull/3400/files

I think my version is more complete since it runs the full gamut of APIs:
initTransactions, beginTransaction, commitTransaction. Also, it is the
version used for our published benchmarks.

I am not sure that this tool is a blocker for the release though, since it
doesn't really affect the usability of the feature any way.

Thanks,
Apurva

On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford  wrote:

> Hi there,
>
> I'm -1 (non-binding) on shipping this RC.
>
> Heroku has carried on performance testing with 0.11 RC1. We have updated
> our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
> transactional features enabled, we get slightly better performance than
> 0.10.2.1 with 10.2.1 client libraries.
>
> However, we attempted to run a performance test today with transactions,
> idempotence and consumer read_committed enabled, but couldn't, because
> enabling transactions requires the producer to call `initTransactions`
> before starting to send messages, and the producer performance tool doesn't
> allow for that.
>
> I'm -1 (non-binding) on shipping this RC in this state, because users
> expect to be able to use the inbuilt performance testing tools, and
> preventing them from testing the impact of the new features using the
> inbuilt tools isn't great. I made a PR for this:
> https://github.com/apache/kafka/pull/3398 (the change is very small).
> Happy
> to make a jira as well, if that makes sense.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Hi Ismael,
> >
> > Thanks for running the release.
> >
> > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> > these checkstyle errors:
> >
> > :clients:checkstyleMain
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> > [ApiExceptionBuilder, BrokerNotAvailableException,
> > ClusterAuthorizationException, ConcurrentTransactionsException,
> > ControllerMovedException, CoordinatorLoadInProgressException,
> > CoordinatorNotAvailableException, CorruptRecordException,
> > DuplicateSequenceNumberException, GroupAuthorizationException,
> > IllegalGenerationException, IllegalSaslStateException,
> > InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> > InvalidConfigurationException, InvalidFetchSizeException,
> > InvalidGroupIdException, InvalidPartitionsException,
> > InvalidPidMappingException, InvalidReplicaAssignmentException,
> > InvalidReplicationFactorException, InvalidRequestException,
> > InvalidRequiredAcksException, InvalidSessionTimeoutException,
> > InvalidTimestampException, InvalidTopicException,
> > InvalidTxnStateException, InvalidTxnTimeoutException,
> > LeaderNotAvailableException, NetworkException, NotControllerException,
> > NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> > NotEnoughReplicasException, NotLeaderForPartitionException,
> > OffsetMetadataTooLarge, OffsetOutOfRangeException,
> > OperationNotAttemptedException, OutOfOrderSequenceException,
> > PolicyViolationException, ProducerFencedException,
> > RebalanceInProgressException, RecordBatchTooLargeException,
> > RecordTooLargeException, ReplicaNotAvailableException,
> > SecurityDisabledException, TimeoutException, TopicAuthorizationException,
> > TopicExistsException, TransactionCoordinatorFencedException,
> > TransactionalIdAuthorizationException, UnknownMemberIdException,
> > UnknownServerException, UnknownTopicOrPartitionException,
> > UnsupportedForMessageFormatException, UnsupportedSaslMechanismException,
> > UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > Class Fan-Out Complexity is 60 (max allowed is 40).
> > [ClassFanOutComplexity]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> > Class Fan-Out Complexity is 43 (max allowed is 40).
> > [ClassFanOutComplexity]
> > [ant:checkstyle] [ERROR]
> > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > java\org\apache\kafka\common\requests\AbstractResponse.java:26:1:
> > Class Fan-Out Complexity is 42 (max allowed is 40).
> > [ClassFanOutComplexity]
> > :clients:checkstyleMain FAILED
> >
> > I wonder if there is an issue with my VM since I don't get similar errors
> > on Ubuntu or Mac.
> >
> > --Vahid
> >
> >
> >
> >
> > From:   Ismael Juma 
> > To:

Re: Consumer throughput drop

2017-07-20 Thread Apurva Mehta
Hi Ovidu,

The see-saw behavior is inevitable with linux when you have concurrent
reads and writes. However, tuning the following two settings may help
achieve more stable performance (from Jay's link):


> *dirty_ratio*Defines a percentage value. Writeout of dirty data begins
> (via *pdflush*) when dirty data comprises this percentage of total system
> memory. The default value is 20.
> Red Hat recommends a slightly lower value of 15 for database workloads.
>


>
> *dirty_background_ratio*Defines a percentage value. Writeout of dirty
> data begins in the background (via *pdflush*) when dirty data comprises
> this percentage of total memory. The default value is 10. For database
> workloads, Red Hat recommends a lower value of 3.


Thanks,
Apurva


On Thu, Jul 20, 2017 at 12:25 PM, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> Yes, I’m using Debian Jessie 2.6 installed on this hardware [1].
>
> It is also my understanding that Kafka is based on system’s cache (Linux
> in this case) which is based on Clock-Pro for page replacement policy,
> doing complex things for general workloads. I will check the tuning
> parameters, but I was hoping for some advices to avoid disk at all when
> reading, considering the system's cache is used completely by Kafka and is
> huge ~128GB - that is to tune Clock-Pro to be smarter when used for
> streaming access patterns.
>
> Thanks,
> Ovidiu
>
> [1] https://www.grid5000.fr/mediawiki/index.php/Rennes:
> Hardware#Dell_Poweredge_R630_.28paravance.29  mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29>
>
> > On 20 Jul 2017, at 21:06, Jay Kreps  wrote:
> >
> > I suspect this is on Linux right?
> >
> > The way Linux works is it uses a percent of memory to buffer new writes,
> at a certain point it thinks it has too much buffered data and it gives
> high priority to writing that out. The good news about this is that the
> writes are very linear, well layed out, and high-throughput. The problem is
> that it leads to a bit of see-saw behavior.
> >
> > Now obviously the drop in performance isn't wrong. When your disk is
> writing data out it is doing work and obviously the read throughput will be
> higher when you are just reading and not writing then when you are doing
> both reading and writing simultaneously. So obviously you can't get the
> no-writing performance when you are also writing (unless you add I/O
> capacity).
> >
> > But still these big see-saws in performance are not ideal. You'd rather
> have more constant performance all the time rather than have linux bounce
> back and forth from writing nothing and then frantically writing full bore.
> Fortunately linux provides a set of pagecache tuning parameters that let
> you control this a bit.
> >
> > I think these docs cover some of the parameters:
> > https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html <
> https://access.redhat.com/documentation/en-US/Red_Hat_
> Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html>
> >
> > -Jay
> >
> > On Thu, Jul 20, 2017 at 10:24 AM, Ovidiu-Cristian MARCU <
> ovidiu-cristian.ma...@inria.fr >
> wrote:
> > Hi guys,
> >
> > I’m relatively new to Kafka’s world. I have an issue I describe below,
> maybe you can help me understand this behaviour.
> >
> > I’m running a benchmark using the following setup: one producer sends
> data to a topic and concurrently one consumer pulls and writes it to
> another topic.
> > Measuring the consumer throughput, I observe values around 500K
> records/s only until the system’s cache gets filled - from this moment the
> consumer throughout drops to ~200K (2.5 times lower).
> > Looking at disk usage, I observe disk read I/O which corresponds to the
> moment the consumer throughout drops.
> > After some time, I kill the producer and immediately I observe the
> consumer throughput goes up to initial values ~ 500K records/s.
> >
> > What can I do to avoid this throughput drop?
> >
> > Attached an image showing disk I/O and CPU usage. I have about 128GB RAM
> on that server which gets filled at time ~2300.
> >
> > Thanks,
> > Ovidiu
> >
> > 
> >
>
>


Re: increased response time for OffsetCommit requests

2017-07-31 Thread Apurva Mehta
How much is the increase? Is there any increase in throughput?

On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi  wrote:

> Hi All,
> We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> Since then we have been observing increased latencies especially
> OffsetCommit requests.
> Looking at the server side metrics, it seems the culprit is the Follower
> time.
>
> We are using following
> inter.broker.protocol.version: 0.11.0.0
> log.message.format.version: 0.9.0.1
>
> Are there some possible pointers that we can explore to troubleshoot the
> root cause?
>
> Best Regards,
> Gaurav Abbi
>


Re: increased response time for OffsetCommit requests

2017-07-31 Thread Apurva Mehta
Thanks for your response. Is it 200% only for the OffsetCommitRequest, or
is it similar for all the requests?


On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi  wrote:

> Hi Apurva,
> 1. The increase is about 200%.
> 2. There is no increase in throughput. However,  this has caused in error
> rate and a decrease in the responses received per second.
>
>
> One more thing to mention, we also upgraded to 0.11.0.0 client libraries.
> We are currently using old Producer and consumer APIs.
>
>
>
> Best Regards,
> Gaurav Abbi
>
> On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta  wrote:
>
> > How much is the increase? Is there any increase in throughput?
> >
> > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi 
> > wrote:
> >
> > > Hi All,
> > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > Since then we have been observing increased latencies especially
> > > OffsetCommit requests.
> > > Looking at the server side metrics, it seems the culprit is the
> Follower
> > > time.
> > >
> > > We are using following
> > > inter.broker.protocol.version: 0.11.0.0
> > > log.message.format.version: 0.9.0.1
> > >
> > > Are there some possible pointers that we can explore to troubleshoot
> the
> > > root cause?
> > >
> > > Best Regards,
> > > Gaurav Abbi
> > >
> >
>


Re: increased response time for OffsetCommit requests

2017-08-01 Thread Apurva Mehta
Sorry to keep prodding you with questions, but can you quantify the
increase for the ProduceRequest? What is the workload you are testing
against: specificallly the batch size, message size, linger time settings
of the producers in question?

I ask because we benchmarked 0.11.0 against the older 0.10.0 message format
and found no difference in performance between an 0.10.2 on the 0.10
message format and 0.11.0 on the 0.10 message format.  Could you create a
topic with the 0.10.0 message format and see if there is any degradation
for the same workload?

Thanks,
Apurva


On Tue, Aug 1, 2017 at 2:51 AM, Gaurav Abbi  wrote:

> Hi Apurva,
> There are increases in the *Produce* request also. It is not as substantial
> as compared to *OffsetCommit. *For both of these requests, the major
> contributor is Remote time.
> A couple of other metrics that show different behavior post upgrade:
>
>1. *LogStartOffset*: It has drastically decreased.
>2. *NumDelayedOperations: *It has dropped.
>
> These could be related or may be these are intended good changes in Kafka
> 0.11.0.0 or one of the previous versions.
>
> Best Regards,
> Gaurav Abbi
>
> On Tue, Aug 1, 2017 at 12:11 AM, Apurva Mehta  wrote:
>
> > Thanks for your response. Is it 200% only for the OffsetCommitRequest, or
> > is it similar for all the requests?
> >
> >
> > On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi 
> > wrote:
> >
> > > Hi Apurva,
> > > 1. The increase is about 200%.
> > > 2. There is no increase in throughput. However,  this has caused in
> error
> > > rate and a decrease in the responses received per second.
> > >
> > >
> > > One more thing to mention, we also upgraded to 0.11.0.0 client
> libraries.
> > > We are currently using old Producer and consumer APIs.
> > >
> > >
> > >
> > > Best Regards,
> > > Gaurav Abbi
> > >
> > > On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta 
> > wrote:
> > >
> > > > How much is the increase? Is there any increase in throughput?
> > > >
> > > > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi 
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > > > Since then we have been observing increased latencies especially
> > > > > OffsetCommit requests.
> > > > > Looking at the server side metrics, it seems the culprit is the
> > > Follower
> > > > > time.
> > > > >
> > > > > We are using following
> > > > > inter.broker.protocol.version: 0.11.0.0
> > > > > log.message.format.version: 0.9.0.1
> > > > >
> > > > > Are there some possible pointers that we can explore to
> troubleshoot
> > > the
> > > > > root cause?
> > > > >
> > > > > Best Regards,
> > > > > Gaurav Abbi
> > > > >
> > > >
> > >
> >
>


Re: Kafka 0.11.0 problem with transactions.

2017-08-03 Thread Apurva Mehta
Ismael raises good questions about what transactions would mean for the
console producer.

However, the kafka-producer-perf-test script has transactions enabled. It
enables you to generate transactions of a certain duration (like 50ms,
100ms). It produces messages of specified size and commits them
transactionally in a periodic manner, enabling you to at least have a look
at the transaction log, etc.

Thanks,
Apurva

On Thu, Aug 3, 2017 at 6:22 AM, Ismael Juma  wrote:

> Hi Marcin,
>
> The console producer hasn't been updated to invoke the appropriate methods
> if transactions are enabled. It also requires a bit of thinking on how it
> should work. Would there be a way to start and commit the transaction via
> the console or would the console producer do it periodically? What was your
> intent?
>
> Ismael
>
> On Thu, Aug 3, 2017 at 9:50 AM, Bienek, Marcin 
> wrote:
>
> > Hi,
> >
> > I’m trying to test the new exactly once transaction feature.  Doing
> simple
> > test like:
> >
> > /opt/kafka/bin/kafka-console-producer.sh --request-required-acks "all"
> > --producer-property "transactional.id=777" --producer-property="enable.
> idempotence=true"
> > --broker-list broker1:9092 --topic bla
> >
> > Fails with:
> >
> > java.lang.IllegalStateException: Cannot perform a 'send' before
> > completing a call to initTransactions when transactions are enabled.
> > at org.apache.kafka.clients.producer.internals.
> TransactionManager.
> > failIfNotReadyForSend(TransactionManager.java:253)
> > at org.apache.kafka.clients.producer.internals.
> TransactionManager.
> > maybeAddPartitionToTransaction(TransactionManager.java:233)
> > at org.apache.kafka.clients.producer.KafkaProducer.doSend(
> > KafkaProducer.java:745)
> > at org.apache.kafka.clients.producer.KafkaProducer.send(
> > KafkaProducer.java:701)
> > at kafka.producer.NewShinyProducer.send(BaseProducer.scala:47)
> > at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:61)
> > at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
> >
> > I suspect that somehow the producer is not able to trigger the creation
> of
> > the internal transaction topic ?
> >
> >
> > BR,
> > Marcin
> >
>


Re: increased response time for OffsetCommit requests

2017-08-03 Thread Apurva Mehta
Hi Gaurav, those results are definitely inconsistent with the benchmarking
we did. Can you see if this reproduces with the 0.10.0 message format
running with 0.11.0.0 broker?

On Wed, Aug 2, 2017 at 4:52 AM, Gaurav Abbi  wrote:

> Hi Apurva,
> For the ProduceRequest,
>
>- The increase is from 470 ms to around 1.004 s.
>- The average batch size (batch-size-avg) is around 320B.
>- The linger time is 10ms.
>
> However, the 99th percentile for OffsetCommit has increased from 1.08 to
> 2.8 seconds.
>
> Best Regards,
> Gaurav Abbi
>
> On Tue, Aug 1, 2017 at 7:32 PM, Apurva Mehta  wrote:
>
> > Sorry to keep prodding you with questions, but can you quantify the
> > increase for the ProduceRequest? What is the workload you are testing
> > against: specificallly the batch size, message size, linger time settings
> > of the producers in question?
> >
> > I ask because we benchmarked 0.11.0 against the older 0.10.0 message
> format
> > and found no difference in performance between an 0.10.2 on the 0.10
> > message format and 0.11.0 on the 0.10 message format.  Could you create a
> > topic with the 0.10.0 message format and see if there is any degradation
> > for the same workload?
> >
> > Thanks,
> > Apurva
> >
> >
> > On Tue, Aug 1, 2017 at 2:51 AM, Gaurav Abbi 
> wrote:
> >
> > > Hi Apurva,
> > > There are increases in the *Produce* request also. It is not as
> > substantial
> > > as compared to *OffsetCommit. *For both of these requests, the major
> > > contributor is Remote time.
> > > A couple of other metrics that show different behavior post upgrade:
> > >
> > >1. *LogStartOffset*: It has drastically decreased.
> > >2. *NumDelayedOperations: *It has dropped.
> > >
> > > These could be related or may be these are intended good changes in
> Kafka
> > > 0.11.0.0 or one of the previous versions.
> > >
> > > Best Regards,
> > > Gaurav Abbi
> > >
> > > On Tue, Aug 1, 2017 at 12:11 AM, Apurva Mehta 
> > wrote:
> > >
> > > > Thanks for your response. Is it 200% only for the
> OffsetCommitRequest,
> > or
> > > > is it similar for all the requests?
> > > >
> > > >
> > > > On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi  >
> > > > wrote:
> > > >
> > > > > Hi Apurva,
> > > > > 1. The increase is about 200%.
> > > > > 2. There is no increase in throughput. However,  this has caused in
> > > error
> > > > > rate and a decrease in the responses received per second.
> > > > >
> > > > >
> > > > > One more thing to mention, we also upgraded to 0.11.0.0 client
> > > libraries.
> > > > > We are currently using old Producer and consumer APIs.
> > > > >
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Gaurav Abbi
> > > > >
> > > > > On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta  >
> > > > wrote:
> > > > >
> > > > > > How much is the increase? Is there any increase in throughput?
> > > > > >
> > > > > > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <
> > abbi.gau...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > > > > > Since then we have been observing increased latencies
> especially
> > > > > > > OffsetCommit requests.
> > > > > > > Looking at the server side metrics, it seems the culprit is the
> > > > > Follower
> > > > > > > time.
> > > > > > >
> > > > > > > We are using following
> > > > > > > inter.broker.protocol.version: 0.11.0.0
> > > > > > > log.message.format.version: 0.9.0.1
> > > > > > >
> > > > > > > Are there some possible pointers that we can explore to
> > > troubleshoot
> > > > > the
> > > > > > > root cause?
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Gaurav Abbi
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: implementing kafka transactions : performance issue

2017-09-18 Thread Apurva Mehta
Hi Hugues.

How 'big' are your transactions? In particular, how many produce records
are in a single transaction? Can you share your actual producer code?

Also, did you try the `kafka-producer-perf-test.sh` tool with a
transactional id and see what the latency is for transactions with that
tool?

Thanks,
Apurva

On Mon, Sep 18, 2017 at 9:02 AM,  wrote:

> Hi,
> I am testing an app with transactions on the producer side of kafka
> (0.11.0.1) .   I  defined the producer config (see below) and added the
> necessary lines in the app (#initTransaction, #begintransaction and
> #commitTransaction) around the existing #send
> The problem I am facing is that each transcation takes up to 150ms to be
> treated which doesn't make sense, even for a laptop !
> I have tested some batch size config witout any success (messages are
> around 100 bytes)
> I certainly made a mistake in the setup but can't figure out which one, or
> how to investigate. I checked by removing the transaction lines and the
> app works fine (in my case less than 200 ms for 100 "send"s  to kafka)
>
> My config is : 3 VMs on my laptop for the kafka cluster.  My main topic
> has 3 partitions, with 3 replicas and the min.insync .replicas is set to 2
>
>
> the producer is defined by (remaing configs by default)
> final Properties props = new Properties();
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
> bootstrap_Servers);
> props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> org.apache.kafka.common.serialization.StringSerializer.class);
> props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> io.confluent.kafka.serializers.KafkaAvroSerializer.class);
> props.put(AbstractKafkaAvroSerDeConfig.
> SCHEMA_REGISTRY_URL_CONFIG,schema_Registry_URL);
>
> props.put(ProducerConfig.ACKS_CONFIG, "all");
> props.put(ProducerConfig.RETRIES_CONFIG , 5);
>
> props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,true);
> props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG,
> transactionnalId);
> props.put(ProducerConfig.
> MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,1);
>
> confluentProducer = new KafkaProducer<>(props);
>
> Any idea what could be wrong ? have I forgotten something ?
> Thanks
> Hugues DESLANDES
>
>
>
>
>
>


Re: Replication does not start because of OuOfOrderSequenceException

2017-10-02 Thread Apurva Mehta
Hi Stas,

Thanks for reporting this. It would be helpful to have JIRA with more of
the server logs on the leaders and followers in the time leading up to this
OutOfOrderSequenceException.

The answers to the following questions would help, when you file the JIRA:

What are the retention settings for this topic? Is it configured for
compaction? Compaction and deletion? What is the retention.time.ms setting?
What is the retention.bytes setting? What messages are being written to the
topic? Particularly, do they have a create time explicitly set by the
application?

Thanks,
Apurva

On Mon, Oct 2, 2017 at 4:40 AM, Ismael Juma  wrote:

> Hi Stas,
>
> Thank you for reporting this. Can you please file an issue? Even if
> KAFKA-5793 has fixed it for 1.0.0 (which needs to be verified), we should
> consider whether a fix is needed for the 0.11.0 branch as well.
>
> Ismael
>
> On Mon, Oct 2, 2017 at 11:28 AM, Stas Chizhov  wrote:
>
> > Hi,
> >
> > We run 0.11.01 and there was a problem with 1 ReplicationFetcher on one
> of
> > the brokers - it experience out of order sequence problem for one
> > topic/partition and was stopped. It stayed stopped over the weekend.
> During
> > this time log cleanup was working and by now it has cleaned up all the
> data
> > in the partitions that this fetcher was responsible for - including other
> > partitions that didnt have out of order sequence problem at first place.
> It
> > is not completely clear to me why this initial problem occurred, but at
> > this moment there is a borker with no data for few partitions and
> > replication fetcher fails upon restart with
> > "org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid
> > sequence number for new epoch: 0 (request epoch), 154277489 (seq.
> > number)".  I believe this is
> > https://issues.apache.org/jira/browse/KAFKA-5793.
> > However I wonder what is the easiest way of bringing this replicas back
> > online?
> >
> > Best regards,
> > Stanislav.
> >
>


Re: Reg. Kafka transactional producer and consumer

2017-11-08 Thread Apurva Mehta
Hi,

Your log segment dump and the producer log don't correlate. The producer
log shows the producerId == 4001. But your log segment dumps don't have
this producerId. Please share data from the same run where you reproduce
this issue.

For the producerId's 0-4 (shown in the dump), there seem to be no
transaction markers (because these would have sequence number == -1). So if
your messages from producerId 4001 are behind these messages, they would
never be read in read committed mode.

Thanks,
Apurva

On Mon, Nov 6, 2017 at 9:44 PM, Abhishek Verma 
wrote:

> Hi Matthis J. Sax,
>
> Thank you for your suggestions.
>
> I tried the same in kafka 1.0.0 version also. Same issue is coming.
>
> I am attaching log segment below please let me know what might be the
> problem.
>
> Regards,
> Abhishek Verma
>
> 
>
>
>
> Dumping .index
>
> offset: 0 position: 0
>
> Dumping .log
>
> Starting offset: 0
>
> baseOffset: 0 lastOffset: 0 baseSequence: 0 lastSequence: 0 producerId: 0
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position: 0
> CreateTime: 1509605714710 isvalid: true size: 103 magic: 2 compresscodec:
> NONE crc:344974185
>
> baseOffset: 1 lastOffset: 1 baseSequence: 1 lastSequence: 1 producerId: 0
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position:
> 103 CreateTime: 1509605714863 isvalid: true size: 103 magic: 2
> compresscodec: NONE crc:102431214
>
> baseOffset: 2 lastOffset: 2 baseSequence: 0 lastSequence: 0 producerId: 1
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position:
> 206 CreateTime: 1509607351944 isvalid: true size: 103 magic: 2
> compresscodec: NONE crc:1129944557
>
> baseOffset: 3 lastOffset: 3 baseSequence: 0 lastSequence: 0 producerId: 2
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position:
> 309 CreateTime: 1509616649669 isvalid: true size: 110 magic: 2
> compresscodec: NONE crc:630443129
>
> baseOffset: 4 lastOffset: 4 baseSequence: 0 lastSequence: 0 producerId: 3
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position:
> 419 CreateTime: 1509616850564 isvalid: true size: 110 magic: 2
> compresscodec: NONE crc:3357473778
>
> baseOffset: 5 lastOffset: 5 baseSequence: 0 lastSequence: 0 producerId: 4
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position:
> 529 CreateTime: 1509624206511 isvalid: true size: 110 magic: 2
> compresscodec: NONE crc:1193735168
>
> baseOffset: 6 lastOffset: 6 baseSequence: 0 lastSequence: 0 producerId: 5
> producerEpoch: 0 partitionLeaderEpoch: 0 isTransactional: true position:
> 639 CreateTime: 1509624453377 isvalid: true size: 110 magic: 2
> compresscodec: NONE crc:3859361029
>
> Dumping .timeindex
>
> timestamp: 0 offset: 0
>
> Found timestamp mismatch in :D:\tmp\kafka-logs-0\topic-0\
> .timeindex
>
> Index timestamp: 0, log timestamp: 1509605714710
>
> Index timestamp: 0, log timestamp: 1509605714710
>
> Found out of order timestamp in :D:\tmp\kafka-logs-0\topic-0\
> .timeindex
>
> Index timestamp: 0, Previously indexed timestamp: 0
>
>
>
> 
> 
> From: Matthias J. Sax 
> Sent: Saturday, November 4, 2017 8:11:07 PM
> To: users@kafka.apache.org
> Subject: Re: Reg. Kafka transactional producer and consumer
>
> Hi,
>
> this consumer log line indicates that there is an open/pending
> transaction (ie, neither committed nor aborted) and thus, the broker
> does not deliver the data to the consumer.
>
> -> highWaterMark = 5, but lastStableOffset = 0
>
>
> On 11/2/17 5:25 AM, Abhishek Verma wrote:
> > 1871 [main] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher
> - Fetch READ_COMMITTED at offset 0 for partition topic-4-0 returned fetch
> data (error=NONE, highWaterMark=5, lastStableOffset = 0, logStartOffset =
> 0, abortedTransactions = [], recordsSizeInBytes=0)
>
>
> Thus, there must be an issue on the producer side, that the transactions
> does not get committed. Not sure why though, as producer logs indicate
> that the TX was committed successfully.
>
> Maybe you can dump the log segments to see what is in them?
>
> Btw: Kafka 1.0.0 was release recently, containing several bug fixes for
> transactions. Maybe you can try if it fixed in 1.0.0.
>
>
> -Matthias
>
>


Re: addition

2016-11-21 Thread Apurva Mehta
It's self serve: http://kafka.apache.org/contact

On Mon, Nov 21, 2016 at 1:20 AM, marcel bichon  wrote:

> request of addition to the mailing list
>


Re: Disadvantages of Upgrading Kafka server without upgrading client libraries?

2016-11-29 Thread Apurva Mehta
I may be wrong, but since there have been message format changes between
0.8.2 and 0.10.1, there will be a performance penalty if the clients are
not also upgraded. This is because you lose the zero-copy semantics on the
server side as the messages have to be converted to the old format before
being sent out on the wire to the old clients.

On Tue, Nov 29, 2016 at 10:06 AM, Thomas Becker  wrote:

> The only obvious downside I'm aware of is not being able to benefit
> from the bugfixes in the client. We are essentially doing the same
> thing; we upgraded the broker side to 0.10.0.0 but have yet to upgrade
> our clients from 0.8.1.x.
>
> On Tue, 2016-11-29 at 09:30 -0500, Tim Visher wrote:
> > Hi Everyone,
> >
> > I have an install of Kafka 0.8.2.1 which I'm upgrading to 0.10.1.0. I
> > see
> > that Kafka 0.10.1.0 should be backwards compatible with client
> > libraries
> > written for older versions but that newer client libraries are only
> > compatible with their version and up.
> >
> > My question is what disadvantages would there be to never upgrading
> > the
> > clients? I'm mainly asking because it would be advantageous to save
> > some
> > time here with a little technical debt if the costs weren't too high.
> > If
> > there are major issues then I can take on the client upgrade as well.
> >
> > Thanks in advance!
> >
> > --
> >
> > In Christ,
> >
> > Timmy V.
> >
> > http://blog.twonegatives.com/
> > http://five.sentenc.es/ -- Spend less time on mail
> --
>
>
> Tommy Becker
>
> Senior Software Engineer
>
> O +1 919.460.4747
>
> tivo.com
>
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>


Re: Message order different each time stream is replayed?

2016-11-30 Thread Apurva Mehta
How many partitions do you have in that topic. Kafka only guarantees a
total ordering of messages within a partition, not across partitions of a
topic. If you want total ordering over the entire topic, you need to create
a topic with a single partition.

On Wed, Nov 30, 2016 at 4:10 AM, Ali Akhtar  wrote:

> While I was connected to console-consumer.sh, I posted a few messages to a
> Kafka topic, one message at a time, across a few hours.
>
> I'd post a message, see it arrive in console-consumer, a few mins later I'd
> post the next message, and so on.
>
> They all arrived in order.
>
> However, when I now try to view console-consumer.sh --from-beginning, each
> time I do so, the messages seem to be shown in a shuffled order.
>
> Also the last 2 messages appear to be missing, although its possible I
> didn't see them in the shuffled output (the topic has over 10k messages
> total which are all dumped to console when I view).
>
> Is this just due to network lag when viewing console-consumer, or should I
> expect my actual consumers to also receive messages out of order each time
> they replay a topic?
>


Re: About stopping a leader

2016-12-01 Thread Apurva Mehta
Yes, the leader should move to K2 or K3. You can check the controller log
on all 3 machines to find out where the new leader is placed. It is not
guaranteed to move back to K1 when you restart it 2 hours later, however.

On Mon, Nov 21, 2016 at 3:38 AM, marcel bichon  wrote:

> Hello !
>
> I have a three brokers (K1, K2, K3) cluster using Kafka 0.8.2 and a
> zookeeper cluster (colocalized with kafka brokers).
> I have a topic with one partition and a replication factor of 3.
> I have a producer publishing messages in the topic every minuts (1+
> message)
> I have a consumergroup consuming messages every hour.
> The offset of this consumergroup for this topic is stored in zookeeper.
> The leader for this partition for this topic is K1.
> The replicas are K2 and K3.
>
> Sometimes, the consumergroup does not find any new messages.
>
> In order to investigate and to test, I was wondering if I could just stop
> K1 ? Will K2 or K2 become the leader ? What will happen if two hours later
> I start again K1 ?
>
> Best regards.
>
> M.
>


Re: Tracking when a batch of messages has arrived?

2016-12-02 Thread Apurva Mehta
That should work, though it sounds like you may be interested in :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

If you can include the 'batch_id' inside your messages, and define custom
control messages with a control topic, then you would not need one topic
per batch, and you would be very close to the essence of the above proposal.

Thanks,
Apurva

On Fri, Dec 2, 2016 at 5:02 AM, Ali Akhtar  wrote:

> Heya,
>
> I need to send a group of messages, which are all related, and then process
> those messages, only when all of them have arrived.
>
> Here is how I'm planning to do this. Is this the right way, and can any
> improvements be made to this?
>
> 1) Send a message to a topic called batch_start, with a batch id (which
> will be a UUID)
>
> 2) Post the messages to a topic called batch_msgs_. Here batch_id
> will be the batch id sent in batch_start.
>
> The number of messages sent will be recorded by the producer.
>
> 3) Send a message to batch_end with the batch id and the number of sent
> messages.
>
> 4) On the consumer side, using Kafka Streaming, I would listen to
> batch_end.
>
> 5) When the message there arrives, I will start another instance of Kafka
> Streaming, which will process the messages in batch_msgs_
>
> 6) Perhaps to be extra safe, whenever batch_end arrives, I will start a
> throwaway consumer which will just count the number of messages in
> batch_msgs_. If these don't match the # of messages specified in
> the batch_end message, then it will assume that the batch hasn't yet
> finished arriving, and it will wait for some time before retrying. Once the
> correct # of messages have arrived, THEN it will trigger step 5 above.
>
> Will the above method work, or should I make any changes to it?
>
> Is step 6 necessary?
>


Re: Suggestions

2016-12-02 Thread Apurva Mehta
>
>  then, the strange thing is that the consumer on
> the second topic which stay in poll forever, *without receive any message*.


How long is 'forever'? Did you wait more than 5 minutes?

On Fri, Dec 2, 2016 at 2:55 AM, Vincenzo D'Amore  wrote:

> Hi Kafka Gurus :)
>
> I'm creating process between few applications.
>
> First application create a producer and then write a message into a main
> topic (A), within the message there is the name of a second topic (B). Then
> promptly create a second producer and write few message into the new topic
> (B).
>
> I write here because I don't understand why, a second application which is
> in poll on the main topic, when receive the first message (which contain
> the name of second topic), then, the strange thing is that the consumer on
> the second topic which stay in poll forever, *without receive any message*.
>
> What is wrong in this scenario, am I missing something? Please help.
>
> producer has this default properties:
>
>   "bootstrap.servers":"localhost:9092",
>   "acks":"all",
>   "retries":0,
>   "batch.size":16384,
>   "linger.ms":1,
>   "buffer.memory":33554432,
>
> "key.serializer":"org.apache.kafka.common.serialization.StringSerializer",
>
> "value.serializer":"org.apache.kafka.common.serialization.
> StringSerializer"
>
> consumer has this default properties:
>
>   "bootstrap.servers":"localhost:9092",
>   "enable.auto.commit":"true",
>   "auto.commit.interval.ms":"1000",
>   "session.timeout.ms":"3",
>   "buffer.memory":33554432,
>   "key.deserializer":
> "org.apache.kafka.common.serialization.StringDeserializer",
>   "value.deserializer":
> "org.apache.kafka.common.serialization.StringDeserializer"
>
> usually there are 2 active groups (group_A and group_B).
>
> Best regards,
> Vincenzo
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


Re: Is CreateTopics and DeleteTopics ready for production usage?

2016-12-05 Thread Apurva Mehta
It isn't ready yet. It is part of the work related to
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations

Thanks,
Apurva

On Mon, Dec 5, 2016 at 11:10 AM, Dmitry Lazurkin  wrote:

> Hello.
>
> Are requests CreateTopics and DeleteTopics ready for production usage?
>
> Why TopicCommand doesn't use CreateTopics / DeleteTopics?
>
> Thanks.
>
>


Re: Is CreateTopics and DeleteTopics ready for production usage?

2016-12-05 Thread Apurva Mehta
I should clarify, that those requests may work, but are not used in any
active code. The integration with the rest of the system is yet to happen.

On Mon, Dec 5, 2016 at 1:45 PM, Apurva Mehta  wrote:

> It isn't ready yet. It is part of the work related to
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 4+-+Command+line+and+centralized+administrative+operations
>
> Thanks,
> Apurva
>
> On Mon, Dec 5, 2016 at 11:10 AM, Dmitry Lazurkin 
> wrote:
>
>> Hello.
>>
>> Are requests CreateTopics and DeleteTopics ready for production usage?
>>
>> Why TopicCommand doesn't use CreateTopics / DeleteTopics?
>>
>> Thanks.
>>
>>
>


Re: NotLeaderForPartitionException

2016-12-06 Thread Apurva Mehta
Hi Sven,

You will see this exception during leader election. When the leader for a
partition moves to another broker, there is a period during which the
replicas would still connect to the original leader, at which point they
will raise this exception. This should be a very short period, after which
they will connect to and replicate from the new leader correctly.

This is not a fatal error, and you will see it if you are bouncing brokers
(since all the leaders on that broker will have to move after the bounce).
You may also see it if some brokers have connectivity issues: they may be
considered dead, and their partitions would be moved elsewhere.

Hope this helps,
Apurva

On Tue, Dec 6, 2016 at 10:06 AM, Sven Ludwig  wrote:

> Hello,
>
> in our Kafka clusters we sometimes observe a specific ERROR log-statement,
> and therefore we have doubts whether it is already running sable in our
> configuration. This occurs every now and then, like two or three times in a
> day. It is actually the predominant ERROR log-statement in our cluster.
> Example:
>
> [2016-12-06 17:14:50,909] ERROR [ReplicaFetcherThread-0-3], Error for
> partition [,] to broker
> 3:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
> server is
> not the leader for that topic-partition. (kafka.server.
> ReplicaFetcherThread)
>
> We already asked Google, but we did not find sufficient answers to our
> questions, therefore I am asking on the mailing list:
>
> 1. What are the possible reasons for this particular error?
>
> 2. What are the implications of it?
>
> 3. What can be done to prevent it?
>
> Best Regards,
> Sven
>


Re: ActiveControllerCount is always will be either 0 or 1 in 3 nodes kafka cluster?

2016-12-13 Thread Apurva Mehta
Thanks Sven, I will followup and ensure that the document is tightened up.

Apurva

On Mon, Dec 12, 2016 at 4:57 AM, Sven Ludwig  wrote:

> Hi,
>
> in JMX each Kafka broker has a value 1 or 0 for ActiveControllerCount. As
> I understood from this thread, the sum of these values across the cluster
> should never be something other than 1. The documentation at
> http://docs.confluent.io/3.1.0/kafka/monitoring.html should be improved
> to make that clear. Currently it is misleading:
>
> kafka.controller:type=KafkaController,name=ActiveControllerCount
> Number of active controllers in the cluster. Alert if value is anything
> other than 1.
>
> Suggested:
>
> kafka.controller:type=KafkaController,name=ActiveControllerCount
> Number of active controllers on a broker. Alert if the aggregated sum
> across all brokers in the cluster is anything other than 1, because in a
> cluster there should only be one broker with an active controller (cluster
> singleton).
>
> Kind Regards,
> Sven
>


Re: failed to delete kafka topic when building from source

2016-12-13 Thread Apurva Mehta
How are you trying to delete the topic? Next time this occurs, can you
check whether the process has permissions to perform that operation?

On Mon, Dec 12, 2016 at 10:55 PM, Sachin Mittal  wrote:

> Hi,
> I recently built an application from source and I get the following
> exception when trying to delete a topic
>
> kafka.common.KafkaStorageException: Failed to rename log directory from
> D:\tmp\kafka-logs\test-window-stream-0 to
> D:\tmp\kafka-logs\test-window-stream-0.0ce9f915431397d1c2dad4f535
> a3-delete
> at kafka.log.LogManager.asyncDelete(LogManager.scala:451)
> at
> kafka.cluster.Partition$$anonfun$delete$1.apply$mcV$sp(
> Partition.scala:164)
>
>
> Note that I have set delete.topic.enable=true
>
> Also this was all working fine when using kafka_2.10-0.10.0.1.
> Issue happens with kafka_2.10-0.10.2.0-SNAPSHOT
>
> Thanks
> Sachin
>


Re: kafka commands taking a long time

2016-12-13 Thread Apurva Mehta
That is certainly odd. What's the latency when using the kafka console
producers and consumers? Is it much faster? If it is, I would just strace
the kafka-topics command to see where it is spending the time.

On Thu, Dec 8, 2016 at 7:21 AM, Stephen Cresswell <
stephen.cressw...@gmail.com> wrote:

> I followed the quickstart instructions at
> https://kafka.apache.org/quickstart and everything seems to be working ok,
> except that commands take a long time to run, e.g.
>
> $ time bin/kafka-topics.sh --list --zookeeper localhost:2181
>
> real 0m11.751s
> user 0m1.540s
> sys 0m0.273s
>
> The zookeeper logging shows that the request is processed in a few
> milliseconds, so I think it's related to the kafka JVM configuration. If I
> remove com.sun.management.jmxremote it's takes 6 seconds but this is still
> much longer than I would have expected.
>
> Any suggestions on how to speed things up?
>


Re: lag for a specific partitions on newly added host

2016-12-13 Thread Apurva Mehta
How did you add the host and when did you measure the lag? If you used the
reassign-partitions script, it will move partitions to the new host, but
the data copy will take time. in that period, those partitions will lag.
However, once the reassign-partitions script finishes, the partitions on
the new replica should be caught up and no longer demonstrate any lag.

On Fri, Dec 9, 2016 at 8:58 PM, Jeremy Hansen  wrote:

> Here’s the topic description:
>
> Topic:blumfrub  PartitionCount:15   ReplicationFactor:5 Configs:
> Topic: blumfrub Partition: 0Leader: 1001Replicas:
> 1001,0,1,2,3  Isr: 0,1001,1,2,3
> Topic: blumfrub Partition: 1Leader: 0   Replicas:
> 0,1,2,3,4 Isr: 0,1,2,3,4
> Topic: blumfrub Partition: 2Leader: 1   Replicas:
> 1,2,3,4,1001  Isr: 1001,1,2,3,4
> Topic: blumfrub Partition: 3Leader: 2   Replicas:
> 2,3,4,1001,0  Isr: 0,1001,2,3,4
> Topic: blumfrub Partition: 4Leader: 0   Replicas:
> 3,4,1001,0,1  Isr: 0,1001,1,3,4
> Topic: blumfrub Partition: 5Leader: 4   Replicas:
> 4,1001,0,1,2  Isr: 0,1001,1,2,4
> Topic: blumfrub Partition: 6Leader: 1001Replicas:
> 1001,1,2,3,4  Isr: 1001,1,2,3,4
> Topic: blumfrub Partition: 7Leader: 0   Replicas:
> 0,2,3,4,1001  Isr: 0,1001,2,3,4
> Topic: blumfrub Partition: 8Leader: 1   Replicas:
> 1,3,4,1001,0  Isr: 0,1001,1,3,4
> Topic: blumfrub Partition: 9Leader: 2   Replicas:
> 2,4,1001,0,1  Isr: 0,1001,1,2,4
> Topic: blumfrub Partition: 10   Leader: 0   Replicas:
> 3,1001,0,1,2  Isr: 0,1001,1,2,3
> Topic: blumfrub Partition: 11   Leader: 4   Replicas:
> 4,0,1,2,3 Isr: 0,1,2,3,4
> Topic: blumfrub Partition: 12   Leader: 1001Replicas:
> 1001,2,3,4,0  Isr: 0,1001,2,3,4
> Topic: blumfrub Partition: 13   Leader: 0   Replicas:
> 0,3,4,1001,1  Isr: 0,1001,1,3,4
> Topic: blumfrub Partition: 14   Leader: 1   Replicas:
> 1,4,1001,0,2  Isr: 0,1001,1,2,4
>
> 1001 is the new broker.
>
> -jeremy
>
>
>
> > On Dec 9, 2016, at 8:55 PM, Jeremy Hansen  wrote:
> >
> > I added a new host to kafka.  Partitions that fall on this new host have
> a very high lag and I’m trying to understand why this would be and how to
> fix it.
> >
> > iloveconsuming  blumfrub  0
> 5434682 7416933 1982251 iloveconsuming_kf0001.host.
> com-1481225033576-47393b55-0
> > iloveconsuming  blumfrub  1
> 7416828 7416875 47  iloveconsuming_kf0001.host.
> com-1481225033769-17152bca-0
> > iloveconsuming  blumfrub  2
> 7416799 7416848 49  iloveconsuming_kf0001.host.
> com-1481225033791-77a30285-0
> > iloveconsuming  blumfrub  3
> 7416898 7416903 5   iloveconsuming_kf0001.host.
> com-1481225033822-d088f844-0
> > iloveconsuming  blumfrub  4
> 7416891 7416925 34  iloveconsuming_kf0001.host.
> com-1481225033846-78f8e5b5-0
> > iloveconsuming  blumfrub  5
> 7416843 7416883 40  iloveconsuming_kf0001.host.
> com-1481225033869-54027178-0
> > iloveconsuming  blumfrub  6
> 5434720 7416896 1982176 iloveconsuming_kf0001.host.
> com-1481225033891-cc3f6bf6-0
> > iloveconsuming  blumfrub  7
> 7416896 7416954 58  iloveconsuming_kf0001.host.
> com-1481225033915-79b49de8-0
> > iloveconsuming  blumfrub  8
> 7416849 7416898 49  iloveconsuming_kf0001.host.
> com-1481225033939-1fe784c0-0
> > iloveconsuming  blumfrub  9
> 7416898 7416917 19  iloveconsuming_kf0001.host.
> com-1481225033961-40cc3185-0
> > iloveconsuming  blumfrub  10
>  354571863545722135  iloveconsuming_kf0001.host.
> com-1481225033998-a817062e-0
> > iloveconsuming  blumfrub  11
>  7416866 7416909 43  iloveconsuming_kf0001.host.
> com-1481225034020-7a15999e-0
> > iloveconsuming  blumfrub  12
>  5434739 7416907 1982168 iloveconsuming_kf0001.host.
> com-1481225034043-badde97c-0
> > iloveconsuming  blumfrub  13
>  7416818 7416865 47  iloveconsuming_kf0001.host.
> com-1481225034066-6e77e3dc-0
> > iloveconsuming  blumfrub  14
>  7416901 7416947 46  iloveconsuming_kf0002.host.
> com-1481225107317-32355c51-0
> >
> > Why do the partitions that fall on the newly added host so la

Re: kafka_2.11-0.9.0.1 crash with java coredump

2016-12-14 Thread Apurva Mehta
I would suggest creating a JIRA and describing in detail what was going on
in the cluster when this happened, and posting the associated broker /
state change / controller logs.

Thanks,
Apurva

On Wed, Dec 14, 2016 at 3:28 AM, Mazhar Shaikh 
wrote:

> Hi All,
>
> I am using kafka_2.11-0.9.0.1 with java version "1.7.0_51".
>
> On random days kafka process stops (crashes) with a java coredump file as
> below.
>
> (gdb) bt
> #0 0x7f33059f70d5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x7f33059fa83b in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2 0x7f33049ae405 in os::abort(bool) () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #3 0x7f3304b2d347 in VMError::report_and_die() () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #4 0x7f3304b2d8de in crash_handler(int, siginfo*, void*) () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #5 
> #6 0x7f33046b92f5 in
> G1BlockOffsetArray::forward_to_block_containing_addr_slow(HeapWord*,
> HeapWord*, void const*) () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #7 0x7f33049a60f0 in os::print_location(outputStream*, long, bool) ()
> from /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #8 0x7f33049b2678 in os::print_register_info(outputStream*, void*) ()
> from /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #9 0x7f3304b2b94b in VMError::report(outputStream*) () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #10 0x7f3304b2cf4a in VMError::report_and_die() () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #11 0x7f33049b2d8f in JVM_handle_linux_signal () from
> /opt/jdk1.7.0_51/jre/lib/amd64/server/libjvm.so
> #12 
> #13 0x7f32ffbc64bf in ?? ()
> #14 0xca57b708 in ?? ()
> #15 0x7f32fae97928 in ?? ()
> #16 0xbf2f05e8 in ?? ()
> #17 0x in ?? ()
> #18 0xc3b27610 in ?? ()
> #19 0xbed92898 in ?? ()
> #20 0xe269aac8 in ?? ()
> #21 0x in ?? ()
>
>
> Can anyone suggest a solution to overcome this issue.
>
> Thank you.
>
> Mazhar Shaikh.
>


Re: Reasonable time to commit offsets?

2016-12-14 Thread Apurva Mehta
Hi Gwilym,

What is the latency for synchronously producing to this cluster? Is it also
1000 to 2000ms?

Thanks,
Apurva

On Wed, Dec 14, 2016 at 2:17 AM, Gwilym Evans 
wrote:

> Hi folks,
>
> New to the list and new to operating Kafka. I'm trying to find out what a
> reasonable turnaround time for committing offsets is.
>
> I'm running a 0.10.1.0 cluster of 17 brokers plus 3 dedicated zookeeper
> nodes, though the cluster has been upgraded from its starting point at
> 0.10.0.0. The offsets topic is at its default configuration. The partition
> I'm consuming and committing offsets for has 150 partitions.
>
> While I can steadily consume around 90,000 messages/sec using 1 consumer
> without (synchronously) committing offsets, if I want to do so
> synchronously it takes between 1000ms and 2000ms to complete the commit. If
> I use multiple consumers to reduce the partition count each is assigned
> (I've tested up to 10) then time-to-commit does _not_ change.
>
> Is this reasonable? Or should it be quicker than this?
>
> For now, my use case is simple topic-splitting: read messages, apply basic
> filters, discard or resend messages to one or more topics. But I don't want
> to advance the offsets until I am certain the messages have been written
> back... hence a cycle of 1) read a batch, 2) process, 3) write, 4) commit.
>
> If more information is needed, I'm happy to supply what I can.
>
> Thanks,
> Gwilym
>


Re: lag for a specific partitions on newly added host

2016-12-14 Thread Apurva Mehta
Which version of Kafka are you using? How did you run the tool exactly? Can
you share the command line?

On Tue, Dec 13, 2016 at 6:05 PM, Jeremy Hansen  wrote:

> The new host has been in place for over a week. Lag is still high on
> partitions that exist on that new host.  Should I attempt another reassign?
>
> Thanks
> -jeremy
>
> > On Dec 13, 2016, at 5:43 PM, Apurva Mehta  wrote:
> >
> > How did you add the host and when did you measure the lag? If you used
> the
> > reassign-partitions script, it will move partitions to the new host, but
> > the data copy will take time. in that period, those partitions will lag.
> > However, once the reassign-partitions script finishes, the partitions on
> > the new replica should be caught up and no longer demonstrate any lag.
> >
> >> On Fri, Dec 9, 2016 at 8:58 PM, Jeremy Hansen 
> wrote:
> >>
> >> Here’s the topic description:
> >>
> >> Topic:blumfrub  PartitionCount:15   ReplicationFactor:5 Configs:
> >>Topic: blumfrub Partition: 0Leader: 1001Replicas:
> >> 1001,0,1,2,3  Isr: 0,1001,1,2,3
> >>Topic: blumfrub Partition: 1Leader: 0   Replicas:
> >> 0,1,2,3,4 Isr: 0,1,2,3,4
> >>Topic: blumfrub Partition: 2Leader: 1   Replicas:
> >> 1,2,3,4,1001  Isr: 1001,1,2,3,4
> >>Topic: blumfrub Partition: 3Leader: 2   Replicas:
> >> 2,3,4,1001,0  Isr: 0,1001,2,3,4
> >>Topic: blumfrub Partition: 4Leader: 0   Replicas:
> >> 3,4,1001,0,1  Isr: 0,1001,1,3,4
> >>Topic: blumfrub Partition: 5Leader: 4   Replicas:
> >> 4,1001,0,1,2  Isr: 0,1001,1,2,4
> >>Topic: blumfrub Partition: 6Leader: 1001Replicas:
> >> 1001,1,2,3,4  Isr: 1001,1,2,3,4
> >>Topic: blumfrub Partition: 7Leader: 0   Replicas:
> >> 0,2,3,4,1001  Isr: 0,1001,2,3,4
> >>Topic: blumfrub Partition: 8Leader: 1   Replicas:
> >> 1,3,4,1001,0  Isr: 0,1001,1,3,4
> >>Topic: blumfrub Partition: 9Leader: 2   Replicas:
> >> 2,4,1001,0,1  Isr: 0,1001,1,2,4
> >>Topic: blumfrub Partition: 10   Leader: 0   Replicas:
> >> 3,1001,0,1,2  Isr: 0,1001,1,2,3
> >>Topic: blumfrub Partition: 11   Leader: 4   Replicas:
> >> 4,0,1,2,3 Isr: 0,1,2,3,4
> >>Topic: blumfrub Partition: 12   Leader: 1001Replicas:
> >> 1001,2,3,4,0  Isr: 0,1001,2,3,4
> >>Topic: blumfrub Partition: 13   Leader: 0   Replicas:
> >> 0,3,4,1001,1  Isr: 0,1001,1,3,4
> >>Topic: blumfrub Partition: 14   Leader: 1   Replicas:
> >> 1,4,1001,0,2  Isr: 0,1001,1,2,4
> >>
> >> 1001 is the new broker.
> >>
> >> -jeremy
> >>
> >>
> >>
> >>> On Dec 9, 2016, at 8:55 PM, Jeremy Hansen  wrote:
> >>>
> >>> I added a new host to kafka.  Partitions that fall on this new host
> have
> >> a very high lag and I’m trying to understand why this would be and how
> to
> >> fix it.
> >>>
> >>> iloveconsuming  blumfrub  0
> >> 5434682 7416933 1982251
>  iloveconsuming_kf0001.host.
> >> com-1481225033576-47393b55-0
> >>> iloveconsuming  blumfrub  1
> >> 7416828 7416875 47
> iloveconsuming_kf0001.host.
> >> com-1481225033769-17152bca-0
> >>> iloveconsuming  blumfrub  2
> >> 7416799 7416848 49
> iloveconsuming_kf0001.host.
> >> com-1481225033791-77a30285-0
> >>> iloveconsuming  blumfrub  3
> >> 7416898 7416903 5
>  iloveconsuming_kf0001.host.
> >> com-1481225033822-d088f844-0
> >>> iloveconsuming  blumfrub  4
> >> 7416891 7416925 34
> iloveconsuming_kf0001.host.
> >> com-1481225033846-78f8e5b5-0
> >>> iloveconsuming  blumfrub  5
> >> 7416843 7416883 40
> iloveconsuming_kf0001.host.
> >> com-1481225033869-54027178-0
> >>> iloveconsuming  blumfrub  6
> >> 5434720 7416896 1982176
>  iloveconsuming_kf0001.host.
> >> com-1481225033891-cc3f6bf6-0
> >>> iloveconsuming  blumfrub  7
> >> 7416896 7416954 58
> iloveconsuming_kf0001.host.
> 

Re: Kafka Errors and Hung Brokers

2016-12-14 Thread Apurva Mehta
Regarding 1), you can see a NotLeaderForPartition exception if the leader
for the partition has moved to another host but the client metadata has not
updated itself yet. The messages should disappear once the metadata is
updated on all clients.

Leaders may move if brokers are bounced, or if they have connectivity
issues with zookeeper. Looking at your second point, it seems like
connectivity may be a problem. Where is zookeeper running? do your brokers
have a solid link to that machine? Do you see any zookeeper connection
errors in your broker logs?

On Tue, Dec 13, 2016 at 6:41 PM, Shailesh Hemdev <
shailesh.hem...@foresee.com> wrote:

> We are using a 3 node Kafka cluster and are encountering some weird issues.
>
> 1) On Each node, when we tail the server.log file under /var/log/kafka we
> see continuous errors like these
>
> pic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-12-14 02:39:30,747] ERROR [ReplicaFetcherThread-0-441], Error
> for
> partition [dev-core-platform-logging,15] to broker
> 441:org.apache.kafka.common.errors.NotLeaderForPartitionException:
> This
> server is not the leader for that topic-partition.
> (kafka.server.ReplicaFetcherThread)
>
> The broker is up and is showing under zookeeper. So it is not clear why
> these errors occur
>
> 2) Occasionally we will find a Kafka broker that goes down. We have
> adjusted the Ulimit to increase open files as well as added 6g to the heap.
> When the broker goes down, the process is itself up but is de registered
> from Zookeeper
>
> Thanks,
>
> *Shailesh *
>
> --
>
>
> This email communication (including any attachments) contains information
> from Answers Corporation or its affiliates that is confidential and may be
> privileged. The information contained herein is intended only for the use
> of the addressee(s) named above. If you are not the intended recipient (or
> the agent responsible to deliver it to the intended recipient), you are
> hereby notified that any dissemination, distribution, use, or copying of
> this communication is strictly prohibited. If you have received this email
> in error, please immediately reply to sender, delete the message and
> destroy all copies of it. If you have questions, please email
> le...@answers.com.
>
> If you wish to unsubscribe to commercial emails from Answers and its
> affiliates, please go to the Answers Subscription Center
> http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>


Re: Connectivity problem with controller breaks cluster

2016-12-27 Thread Apurva Mehta
Looks like you are hitting: https://issues.apache.org/jira/browse/KAFKA-4477

You can try upgrading to 0.10.1.1 and see if the issue recurs (a bunch of
deadlock bugs were fixed which might explain this issue). Or you can try to
provide the data described in
https://issues.apache.org/jira/browse/KAFKA-4477?focusedCommentId=15749722&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15749722
so that we can diagnose the problem.

As it stands, this seems to be a bug introduced in 0.10.1.0. We don't have
enough information to identify the root cause. If you can provide the trace
logging requested on that ticket, it would help.

Thanks,
Apurva

On Tue, Dec 27, 2016 at 9:17 AM, Felipe Santos  wrote:

> Hi,
>
> We are using kafka 0.10.1.0.
>
> We have three brokers and three zookeeper.
>
> Today broker 1 and 2 lost connectivity with broker 3, and I saw the broker
> 3 was the controller.
> I saw lot of messages
> "[rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
> on broker 3: Shrinking ISR for partition
> [rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
> from 1,2,3 to 3"
>
> On the broker 2 and 1:
>
> [2016-12-27 08:10:05,501] WARN [ReplicaFetcherThread-0-3], Error in fetch
> kafka.server.ReplicaFetcherThread$FetchRequest@108fd1b0
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response
> was read
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> at scala.Option.foreach(Option.scala:257)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
> at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
> at
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
> at
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
> at
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:
> 253)
> at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
> All my consumers and producers went down.
> I try to consume and produce with kafka-console-producer/consumer.sh and
> it
> fails.
>
> The only solution was restart broker 3, after that it correct the problem.
>
> Any tips?
> --
> Felipe Santos
>


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-04 Thread Apurva Mehta
Looks good to me!

+1 (non-binding)

On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford  wrote:

> Hi All
>
> We’re having some problems with this thread being subsumed by the
> [Discuss] thread. Hopefully this one will appear distinct. If you see more
> than one, please use this one.
>
> KIP-101 should now be ready for a vote. As a reminder the KIP proposes a
> change to the replication protocol to remove the potential for replicas to
> diverge.
>
> The KIP can be found here:  https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-101+-+Alter+Replication+
> Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+
> rather+than+High+Watermark+for+Truncation>
>
> Please let us know your vote.
>
> B
>
>
>
>
>


Re: Leader imbalance issue

2017-01-20 Thread Apurva Mehta
Hi Meghana,

Have you tried using the 'kafka-prefered-replica-election.sh' script? It
will try to move leaders back to the preferred replicas when there is a
leader imbalance.

https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-1.PreferredReplicaLeaderElectionTool

Thanks,
Apurva


On Fri, Jan 20, 2017 at 8:29 AM, Meghana Narasimhan <
mnarasim...@bandwidth.com> wrote:

> Hi,
> Any input on this will be of great help . A rolling restart could fix the
> issue but not sure if thats the right way to do it.
>
> Thanks,
> Meghana
>
> On Wed, Jan 18, 2017 at 9:46 AM, Meghana Narasimhan <
> mnarasim...@bandwidth.com> wrote:
>
> > Hi,
> >
> > We have a 3 node cluster with 0.9.0.1 version. The controller is
> reporting
> > that one of the brokers has a leader imbalance.
> > One of the topics with 120 partitions has Broker 0 and 2 acting as
> leaders
> > for all its partitions.None of the partitions have Broker 1 as their
> leader.
> >
> > So the controller log reports a list of  partitions  for those topics not
> > being in the preferred
> > replica Map and also leader imbalance.
> >
> > DEBUG [Controller 1]: topics not in preferred replica Map([test,48] ->
> > List(1, 2, 0), [test,54] -> List(1, 2, 0)
> ...)
> > TRACE [Controller 1]: leader imbalance ratio for broker 1 is 0.086957
> > (kafka.controller.KafkaController)
> >
> > I believe broker 1 got into this state because of an issue on that broker
> > a few weeks backup
> > which crashed Kafka on that node. Later the corrupted log was identified
> > and restored.
> > But looks like since then that Broker hasn't been leader for that topic.
> >
> > What is the correct way to fix the issue of leader imbalance ?
> >
> > Thanks,
> > Meghana
> >
> >
> >
>


Re: Messages are lost

2017-01-23 Thread Apurva Mehta
What version of kafka have you deployed? Can you post a thread dump of the
hung broker?

On Fri, Jan 20, 2017 at 12:14 PM, Ghosh, Achintya (Contractor) <
achintya_gh...@comcast.com> wrote:

> Hi there,
>
> I see the below exception in one of my node's log( cluster with 3 nodes)
> and then the node is stopped to responding(it's hung state , I mean if I do
> ps-ef|grep kafka , I see the Kafka process but it is not responding) and we
> lost around 100 messages:
>
>
> 1.   What could be the reason for this exception ? My broker ID is
> unique so what is the solution for this issue?
>
> [2017-01-19 15:56:23,644] ERROR Error handling event ZkEvent[New session
> event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@2d74e7af]
> (org.I0Itec.zkclient.ZkEventThread)
> java.lang.RuntimeException: A broker is already registered on the path
> /brokers/ids/2. This probably indicates that you either have configured a
> brokerid that is already in use, or else you have shutdown this broker and
> restarted it faster than the zookeeper timeout so it appears to be
> re-registering.
> at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.
> scala:305)
> at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.
> scala:291)
> at kafka.server.KafkaHealthcheck.
> register(KafkaHealthcheck.scala:70)
> at kafka.server.KafkaHealthcheck$SessionExpireListener.
> handleNewSession(KafkaHealthcheck.scala:104)
> at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
> at org.I0Itec.zkclient.ZkEventThread.run(
> ZkEventThread.java:71)
>
>
>
> 2.   As we lost 100 messages for each topic and I don't see any
> exception in our application log, so how we can track the exception and
> will make sure the we'll not loose any data(consumer end).
>
> Thanks
> Achintya
>


Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Apurva Mehta
Thanks Ismael, this makes sense.

On Fri, Feb 3, 2017 at 11:50 AM, Guozhang Wang  wrote:

> LGTM too.
>
> On Fri, Feb 3, 2017 at 10:39 AM, Eno Thereska 
> wrote:
>
> > Makes sense.
> >
> > Eno
> >
> > > On 3 Feb 2017, at 10:38, Ismael Juma  wrote:
> > >
> > > Hi all,
> > >
> > > I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> > >
> > > Most people were supportive when we last discussed the topic[1], but
> > there
> > > were a few concerns. I believe the following should mitigate the
> > concerns:
> > >
> > > 1. The new proposal suggests dropping support in the next major version
> > > instead of the next minor version.
> > > 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support
> > 0.10
> > > brokers (0.11 brokers will also support 0.10 clients as usual), so
> there
> > is
> > > even more flexibility on incremental upgrades.
> > > 3. Java 9 will be released shortly after the next Kafka release, so
> we'll
> > > be supporting the 2 most recent Java releases, which is a reasonable
> > policy.
> > > 4. 8 months have passed since the last proposal and the release after
> > > 0.10.2 won't be out for another 4 months, which should hopefully be
> > enough
> > > time for Java 8 to be even more established. We haven't decided when
> the
> > > next major release will happen, but we know that it won't happen before
> > > June 2017.
> > >
> > > Please take a look at the proposal and share your feedback.
> > >
> > > Thanks,
> > > Ismael
> > >
> > > [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2
> >
> >
>
>
> --
> -- Guozhang
>