Re: Custer startup order WRT Zookeeper Quorum for Kafka

2017-07-20 Thread Tom Crayford
You'll need a ZK quorum established before brokers boot, for sure.

On Thu, Jul 20, 2017 at 12:53 PM, M. Manna  wrote:

> Hello,
>
> This might be too obvious for some people, but just thinking out loud here.
>
> So we need a recommended 3 node cluster to achieve the 1 point failure
> model. I am trying to deploy a 3 node cluster (3 zks and 3 brokers) in
> Linux (or Even Windows, doesn't matter here).
>
> Under the circumstance (or any such multiple node cluster setup) - should I
> always bring up the #Quorum ZKs before starting any broker? Or is there a
> recommended startup order ?
>
> My thinking behind this (right or wrong) is that since the Quorum
> availability is important, without the ZK quorum, any broker startup will
> struggle and give up after respective retries and backoff timeouts are
> overrun. But if my understanding is incorrect, could someone please explain
> their analysis?
>
> Regards,
>


Re: Kafka compatibility matrix needed

2017-07-18 Thread Tom Crayford
All broker versions support all older client versions

On Tue, Jul 18, 2017 at 10:15 AM, Sachin Mittal  wrote:

> Hi,
> This gives me some information but still not the complete picture.
>
> It says:
> 0.10.2, Java clients have acquired the ability to communicate with older
> brokers.
>
> It also says
> Version 0.11.0 brokers support 0.8.x and newer clients
>
> Question is does 0.10.2 broker support 0.8.x clients?
>
> This may solve answer my temporary doubt, but a compatibility matrix will
> be much helpful to have a yes/no answer.
>
> Thanks
> Sachin
>
>
>
> On Tue, Jul 18, 2017 at 2:38 PM, Michal Borowiecki <
> michal.borowie...@openbet.com> wrote:
>
>> Have you seen this: http://kafka.apache.org/documentation.html#upgrade
>>
>> Starting with version 0.10.2, Java clients (producer and consumer) have
>> acquired the ability to communicate with older brokers. Version 0.11.0
>> clients can talk to version 0.10.0 or newer brokers. However, if your
>> brokers are older than 0.10.0, you must upgrade all the brokers in the
>> Kafka cluster before upgrading your clients. Version 0.11.0 brokers support
>> 0.8.x and newer clients.
>>
>> Hope that helps.
>>
>> Cheers,
>>
>> Michał
>>
>> On 18/07/17 08:17, Sachin Mittal wrote:
>>
>> Hi,
>> I would like some help/information on what client versions are compatible
>> with what broker versions in kafka.
>>
>> Some table like this would be good
>>
>>  server
>> client   0.80.9   0.10   0.11
>> 0.8  yes ?  ??
>> 0.9  ?yes   ??
>> 0.10?   ?yes?
>> 0.11?   ??yes
>>
>> So if question marks are filled it would be of great help.
>>
>> Reason I am asking is many times we need to use other libraries/frameworks
>> to pull/push data from/into kafka and sometimes these support only a
>> particular version of clients.
>>
>> Like right now I am trying to pull data from kafka via druid/tranquility
>> and they have clients of version 0.8.x implemented but my broker is running
>> 0.10.x.
>>
>> Also if such a table can be posted on kafka documentation page or github
>> page that would be great.
>>
>> Thanks
>> Sachin
>>
>>
>>
>> --
>>  Michal Borowiecki
>> Senior Software Engineer L4
>> T: +44 208 742 1600 <+44%2020%208742%201600>
>>
>>
>> +44 203 249 8448 <+44%2020%203249%208448>
>>
>>
>>
>> E: michal.borowie...@openbet.com
>> W: www.openbet.com
>> OpenBet Ltd
>>
>> Chiswick Park Building 9
>>
>> 566 Chiswick High Rd
>>
>> London
>>
>> W4 5XT
>>
>> UK
>> 
>> This message is confidential and intended only for the addressee. If you
>> have received this message in error, please immediately notify the
>> postmas...@openbet.com and delete it from your system as well as any
>> copies. The content of e-mails as well as traffic data may be monitored by
>> OpenBet for employment and security purposes. To protect the environment
>> please do not print this e-mail unless necessary. OpenBet Ltd. Registered
>> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT,
>> United Kingdom. A company registered in England and Wales. Registered no.
>> 3134634. VAT no. GB927523612
>>
>
>


Re: Logs truncated at o'clock

2017-07-14 Thread Tom Crayford
Hi,

Which folder are you storing kafka's data in? By default that's /tmp, which
might be getting wiped by your OS.

Thanks

Tom Crayford
Heroku Kafka

On Fri, Jul 14, 2017 at 11:50 AM, mosto...@gmail.com <mosto...@gmail.com>
wrote:

> anyone?
>
>
>
> On 13/07/17 17:09, mosto...@gmail.com wrote:
>
>>
>> Hi
>>
>> With swiss precission, our kafka test environment seems to truncate
>> topics at o'clock hours.
>>
>> This might be confirmed with the following trace, which states
>> "Truncating log ... to offset 0"
>>
>> We are still using Kafka 0.10.2.1, but I was wondering if this is
>> resolved in recent versions, it's a know bug or just a misconfiguration (I
>> think it is shown below).
>>
>> Thanks
>>
>> Jul 13 16:00:00 computer kafka[28511]: [2017-07-13 16:00:00,986]
>> INFO [TopicChangeListener on Controller 1002]: New topics:
>> [Set(mytopic.2017-07-13-16)], deleted topics: [Set()], new
>> partition replica assignment [Map([mytopic.2017-07-13-16,0] ->
>> List(1001, 1002), [mytopic.2017-07-13-16,2] -> List(1003, 1001),
>> [mytopic.2017-07-13-16,1] -> List(1002, 1003))]
>> (kafka.controller.PartitionStateMachine$TopicChangeListener)
>> Jul 13 16:00:00 computer kafka[28511]: [2017-07-13 16:00:00,987]
>> INFO [Controller 1002]: New topic creation callback for
>> [mytopic.2017-07-13-16,0],[mytopic.2017-07-13-16,2],[mytopic.
>> 2017-07-13-16,1]
>> (kafka.controller.KafkaController)
>> Jul 13 16:00:00 computer kafka[28511]: [2017-07-13 16:00:00,988]
>> INFO [Controller 1002]: New partition creation callback for
>> [mytopic.2017-07-13-16,0],[mytopic.2017-07-13-16,2],[mytopic.
>> 2017-07-13-16,1]
>> (kafka.controller.KafkaController)
>> Jul 13 16:00:00 computer kafka[28511]: [2017-07-13 16:00:00,988]
>> INFO [Partition state machine on Controller 1002]: Invoking state
>> change to NewPartition for partitions
>> [mytopic.2017-07-13-16,0],[mytopic.2017-07-13-16,2],[mytopic.
>> 2017-07-13-16,1]
>> (kafka.controller.PartitionStateMachine)
>> Jul 13 16:00:00 computer kafka[28511]: [2017-07-13 16:00:00,988]
>> INFO [Replica state machine on controller 1002]: Invoking state
>> change to NewReplica for replicas
>> [Topic=mytopic.2017-07-13-16,Partition=1,Replica=1003],[
>> Topic=mytopic.2017-07-13-16,Partition=0,Replica=1001],[Topic=mytopic.
>> 2017-07-13-16,Partition=2,Replica=1001],[Topic=mytopic.2017-07-13-16,Pa
>> rtition=1,Replica=1002],[Topic=mytopic.2017-07-13-16,
>> Partition=2,Replica=1003],[Topic=mytopic.2017-07-13-16,
>> Partition=0,Replica=1002]
>> (kafka.controller.ReplicaStateMachine)
>> Jul 13 16:00:00 computer kafka[28511]: [2017-07-13 16:00:00,993]
>> INFO [Partition state machine on Controller 1002]: Invoking state
>> change to OnlinePartition for partitions
>> [mytopic.2017-07-13-16,0],[mytopic.2017-07-13-16,2],[mytopic.
>> 2017-07-13-16,1]
>> (kafka.controller.PartitionStateMachine)
>> Jul 13 16:00:01 computer kafka[28511]: [2017-07-13 16:00:01,036]
>> INFO [Replica state machine on controller 1002]: Invoking state
>> change to OnlineReplica for replicas
>> [Topic=mytopic.2017-07-13-16,Partition=1,Replica=1003],[
>> Topic=mytopic.2017-07-13-16,Partition=0,Replica=1001],[Topic=mytopic.
>> 2017-07-13-16,Partition=2,Replica=1001],[Topic=mytopic.2017-07-13-16,Pa
>> rtition=1,Replica=1002],[Topic=mytopic.2017-07-13-16,
>> Partition=2,Replica=1003],[Topic=mytopic.2017-07-13-16,
>> Partition=0,Replica=1002]
>> (kafka.controller.ReplicaStateMachine)
>> Jul 13 16:00:01 computer kafka[28511]: [2017-07-13 16:00:01,037]
>> INFO [ReplicaFetcherManager on broker 1002] Removed fetcher for
>> partitions mytopic.2017-07-13-16-1
>> (kafka.server.ReplicaFetcherManager)
>> Jul 13 16:00:01 computer kafka[28511]: [2017-07-13 16:00:01,039]
>> INFO Completed load of log mytopic.2017-07-13-16-1 with 1 log
>> segments and log end offset 0 in 0 ms (kafka.log.Log)
>> Jul 13 16:00:01 computer kafka[28511]: [2017-07-13 16:00:01,040]
>> INFO Created log for partition [mytopic.2017-07-13-16,1] in
>> /data/kafka-1 with properties {compression.type -> producer,
>> message.format.version -> 0.10.2-IV0, file.delete.delay.ms ->
>> 6, max.message.bytes -> 112, min.compaction.lag.ms -> 0,
>> message.timestamp.type -> CreateTime, min.insync.replicas -> 1,
>> segment.jitter.ms -> 0, preallocate -> false,
>>

Re: [VOTE] 0.11.0.0 RC2

2017-06-23 Thread Tom Crayford
Hi,

As previously, Heroku has been doing performance testing of 0.11.0.0 RC2.
Now that the ProducerPerformance.java tool supports it, we've even been
doing testing with the new transactional/idempotence features in KIP-98.

We've tested with idempotency and read_committed consumers and note no
notable performance impact there from our perspective. That isn't to say
that there is no impact, just that our testing can't see enough of one to
be obvious.

The new tooling supports `transaction-duration-ms` as a flag, which gives
the number of milliseconds between transaction commits in the producer.
This setting has vast (and expected) impacts on performance from our
perspective.

Tuned down to 1ms, we see throughput around 23x lower than without
transactions enabled (but with idempotency and read_committed consumers).

Tuned to 10ms, we see throughput at about 70% of performance without
transactions enabled

At 100ms and 1000ms transaction duration, there is no notable impact to
throughput with usual operating conditions as far as we can tell. That
isn't to say that there is no impact, just that our testing can't see
enough of one to be obvious.

We have also performed some failure testing with 0.11 transactions enabled,
and notice nothing out of the ordinary there. Note that we have not tested
the correctness of the feature (as in, does it deliver what is promised),
just if the producers or consumers or brokers do anything outwardly strange
through the logs etc.

All in all, this is shaping up to be a great release. We're going to
continue some further testing, but right now are heading towards a +1.

Thanks

Tom Crayford
Heroku Kafka

On Fri, Jun 23, 2017 at 2:36 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> A quick note on notable changes since rc1:
>
> 1. A significant performance improvement if transactions are enabled:
> https://github.com/apache/kafka/commit/f239f1f839f8bcbd80cce2a4a8643e
> 15d340be8e
> 2. Fixed a controller regression if many brokers are started
> simultaneously:
> https://github.com/apache/kafka/commit/c0033b0e0b9e56242752c82f15c638
> 8d041914a1
> 3. Fixed a couple of Connect regressions:
> https://github.com/apache/kafka/commit/c029960bf4ae2cd79b22886f4ee519
> c4af0bcc8b
> and
> https://github.com/apache/kafka/commit/1d65f15f2b656b7817eeaf6ee1d36e
> b3e2cf063f
> 4. Fixed an import log cleaner issue:
> https://github.com/apache/kafka/commit/186e3d5efc79ed803f0915d472ace7
> 7cbec88694
>
> Full diff:
> https://github.com/apache/kafka/compare/5b351216621f52a471c21826d0dec3
> ce3187e697...0.11.0.0-rc2
>
> Ismael
>
> On Fri, Jun 23, 2017 at 2:16 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for release of Apache Kafka 0.11.0.0.
> >
> > This is a major version release of Apache Kafka. It includes 32 new KIPs.
> > See the release notes and release plan (https://cwiki.apache.org/
> > confluence/display/KAFKA/Release+Plan+0.11.0.0) for more details. A few
> > feature highlights:
> >
> > * Exactly-once delivery and transactional messaging
> > * Streams exactly-once semantics
> > * Admin client with support for topic, ACLs and config management
> > * Record headers
> > * Request rate quotas
> > * Improved resiliency: replication protocol improvement and
> > single-threaded controller
> > * Richer and more efficient message format
> >
> > Release notes for the 0.11.0.0 release:
> > http://home.apache.org/~ijuma/kafka-0.11.0.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Tuesday, June 27, 6pm PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~ijuma/kafka-0.11.0.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > http://home.apache.org/~ijuma/kafka-0.11.0.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.0 tag:
> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > 8698fa1f41102f1664b05baa4d6953fc9564d91e
> >
> > * Documentation:
> > http://kafka.apache.org/0110/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/0110/protocol.html
> >
> > * Successful Jenkins builds for the 0.11.0 branch:
> > Unit/integration tests: https://builds.apache.org/job/
> > kafka-0.11.0-jdk7/187/
> > System tests: pending (will send an update tomorrow)
> >
> > /**
> >
> > Thanks,
> > Ismael
> >
>


Re: [VOTE] 0.11.0.0 RC1

2017-06-22 Thread Tom Crayford
That's fair, and nice find with the transaction performance improvement!

Once the RC is out, we'll do a final round of performance testing with the
new ProducerPerformance changes enabled.

I think it's fair that this shouldn't delay the release. Is there an
official stance on what should and shouldn't delay a release documented
somewhere?

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jun 22, 2017 at 4:45 PM, Ismael Juma <isma...@gmail.com> wrote:

> Hi Tom,
>
> We are going to do another RC to include Apurva's significant performance
> improvement when transactions are enabled:
>
> https://github.com/apache/kafka/commit/f239f1f839f8bcbd80cce2a4a8643e
> 15d340be8e
>
> Given that, we can also include the ProducerPerformance changes that
> Apurva did to find and fix the performance issue.
>
> In my opinion, the ProducerPerformance change alone would not be enough
> reason for another RC as users can run the tool from trunk to test older
> releases. In any case, this is hypothetical at this point. :)
>
> And thanks for continuing your testing, it's very much appreciated!
>
> Ismael
>
> On Wed, Jun 21, 2017 at 8:03 PM, Tom Crayford <tcrayf...@heroku.com>
> wrote:
>
>> That looks better than mine, nice! I think the tooling matters a lot to
>> the usability of the product we're shipping, being able to test out Kafka's
>> features on your own hardware/setup is very important to knowing if it can
>> work.
>>
>> On Wed, Jun 21, 2017 at 8:01 PM, Apurva Mehta <apu...@confluent.io>
>> wrote:
>>
>>> Hi Tom,
>>>
>>> I actually made modifications to the produce performance tool to do real
>>> transactions earlier this week as part of our benchmarking (results
>>> published here: bit.ly/kafka-eos-perf). I just submitted that patch
>>> here:
>>> https://github.com/apache/kafka/pull/3400/files
>>>
>>> I think my version is more complete since it runs the full gamut of APIs:
>>> initTransactions, beginTransaction, commitTransaction. Also, it is the
>>> version used for our published benchmarks.
>>>
>>> I am not sure that this tool is a blocker for the release though, since
>>> it
>>> doesn't really affect the usability of the feature any way.
>>>
>>> Thanks,
>>> Apurva
>>>
>>> On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford <tcrayf...@heroku.com>
>>> wrote:
>>>
>>> > Hi there,
>>> >
>>> > I'm -1 (non-binding) on shipping this RC.
>>> >
>>> > Heroku has carried on performance testing with 0.11 RC1. We have
>>> updated
>>> > our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
>>> > transactional features enabled, we get slightly better performance than
>>> > 0.10.2.1 with 10.2.1 client libraries.
>>> >
>>> > However, we attempted to run a performance test today with
>>> transactions,
>>> > idempotence and consumer read_committed enabled, but couldn't, because
>>> > enabling transactions requires the producer to call `initTransactions`
>>> > before starting to send messages, and the producer performance tool
>>> doesn't
>>> > allow for that.
>>> >
>>> > I'm -1 (non-binding) on shipping this RC in this state, because users
>>> > expect to be able to use the inbuilt performance testing tools, and
>>> > preventing them from testing the impact of the new features using the
>>> > inbuilt tools isn't great. I made a PR for this:
>>> > https://github.com/apache/kafka/pull/3398 (the change is very small).
>>> > Happy
>>> > to make a jira as well, if that makes sense.
>>> >
>>> > Thanks
>>> >
>>> > Tom Crayford
>>> > Heroku Kafka
>>> >
>>> > On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
>>> > vahidhashem...@us.ibm.com> wrote:
>>> >
>>> > > Hi Ismael,
>>> > >
>>> > > Thanks for running the release.
>>> > >
>>> > > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
>>> > > these checkstyle errors:
>>> > >
>>> > > :clients:checkstyleMain
>>> > > [ant:checkstyle] [ERROR]
>>> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
>>> > > java\org\apache\kafka\common\protocol\Errors.java:89:1:
>>> > > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
>>> >

Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Tom Crayford
That looks better than mine, nice! I think the tooling matters a lot to the
usability of the product we're shipping, being able to test out Kafka's
features on your own hardware/setup is very important to knowing if it can
work.

On Wed, Jun 21, 2017 at 8:01 PM, Apurva Mehta <apu...@confluent.io> wrote:

> Hi Tom,
>
> I actually made modifications to the produce performance tool to do real
> transactions earlier this week as part of our benchmarking (results
> published here: bit.ly/kafka-eos-perf). I just submitted that patch here:
> https://github.com/apache/kafka/pull/3400/files
>
> I think my version is more complete since it runs the full gamut of APIs:
> initTransactions, beginTransaction, commitTransaction. Also, it is the
> version used for our published benchmarks.
>
> I am not sure that this tool is a blocker for the release though, since it
> doesn't really affect the usability of the feature any way.
>
> Thanks,
> Apurva
>
> On Wed, Jun 21, 2017 at 11:12 AM, Tom Crayford <tcrayf...@heroku.com>
> wrote:
>
> > Hi there,
> >
> > I'm -1 (non-binding) on shipping this RC.
> >
> > Heroku has carried on performance testing with 0.11 RC1. We have updated
> > our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
> > transactional features enabled, we get slightly better performance than
> > 0.10.2.1 with 10.2.1 client libraries.
> >
> > However, we attempted to run a performance test today with transactions,
> > idempotence and consumer read_committed enabled, but couldn't, because
> > enabling transactions requires the producer to call `initTransactions`
> > before starting to send messages, and the producer performance tool
> doesn't
> > allow for that.
> >
> > I'm -1 (non-binding) on shipping this RC in this state, because users
> > expect to be able to use the inbuilt performance testing tools, and
> > preventing them from testing the impact of the new features using the
> > inbuilt tools isn't great. I made a PR for this:
> > https://github.com/apache/kafka/pull/3398 (the change is very small).
> > Happy
> > to make a jira as well, if that makes sense.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > Hi Ismael,
> > >
> > > Thanks for running the release.
> > >
> > > Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> > > these checkstyle errors:
> > >
> > > :clients:checkstyleMain
> > > [ant:checkstyle] [ERROR]
> > > C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> > > java\org\apache\kafka\common\protocol\Errors.java:89:1:
> > > Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> > > [ApiExceptionBuilder, BrokerNotAvailableException,
> > > ClusterAuthorizationException, ConcurrentTransactionsException,
> > > ControllerMovedException, CoordinatorLoadInProgressException,
> > > CoordinatorNotAvailableException, CorruptRecordException,
> > > DuplicateSequenceNumberException, GroupAuthorizationException,
> > > IllegalGenerationException, IllegalSaslStateException,
> > > InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> > > InvalidConfigurationException, InvalidFetchSizeException,
> > > InvalidGroupIdException, InvalidPartitionsException,
> > > InvalidPidMappingException, InvalidReplicaAssignmentException,
> > > InvalidReplicationFactorException, InvalidRequestException,
> > > InvalidRequiredAcksException, InvalidSessionTimeoutException,
> > > InvalidTimestampException, InvalidTopicException,
> > > InvalidTxnStateException, InvalidTxnTimeoutException,
> > > LeaderNotAvailableException, NetworkException, NotControllerException,
> > > NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> > > NotEnoughReplicasException, NotLeaderForPartitionException,
> > > OffsetMetadataTooLarge, OffsetOutOfRangeException,
> > > OperationNotAttemptedException, OutOfOrderSequenceException,
> > > PolicyViolationException, ProducerFencedException,
> > > RebalanceInProgressException, RecordBatchTooLargeException,
> > > RecordTooLargeException, ReplicaNotAvailableException,
> > > SecurityDisabledException, TimeoutException,
> TopicAuthorizationException,
> > > TopicExistsException, TransactionCoordinatorFencedException,
> > > TransactionalIdAuthorizationException, UnknownMemberIdException,
> > > UnknownSer

Re: [VOTE] 0.11.0.0 RC1

2017-06-21 Thread Tom Crayford
Hi there,

I'm -1 (non-binding) on shipping this RC.

Heroku has carried on performance testing with 0.11 RC1. We have updated
our test setup to use 0.11.0.0 RC1 client libraries. Without any of the
transactional features enabled, we get slightly better performance than
0.10.2.1 with 10.2.1 client libraries.

However, we attempted to run a performance test today with transactions,
idempotence and consumer read_committed enabled, but couldn't, because
enabling transactions requires the producer to call `initTransactions`
before starting to send messages, and the producer performance tool doesn't
allow for that.

I'm -1 (non-binding) on shipping this RC in this state, because users
expect to be able to use the inbuilt performance testing tools, and
preventing them from testing the impact of the new features using the
inbuilt tools isn't great. I made a PR for this:
https://github.com/apache/kafka/pull/3398 (the change is very small). Happy
to make a jira as well, if that makes sense.

Thanks

Tom Crayford
Heroku Kafka

On Tue, Jun 20, 2017 at 8:32 PM, Vahid S Hashemian <
vahidhashem...@us.ibm.com> wrote:

> Hi Ismael,
>
> Thanks for running the release.
>
> Running tests ('gradlew.bat test') on my Windows 64-bit VM results in
> these checkstyle errors:
>
> :clients:checkstyleMain
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\protocol\Errors.java:89:1:
> Class Data Abstraction Coupling is 57 (max allowed is 20) classes
> [ApiExceptionBuilder, BrokerNotAvailableException,
> ClusterAuthorizationException, ConcurrentTransactionsException,
> ControllerMovedException, CoordinatorLoadInProgressException,
> CoordinatorNotAvailableException, CorruptRecordException,
> DuplicateSequenceNumberException, GroupAuthorizationException,
> IllegalGenerationException, IllegalSaslStateException,
> InconsistentGroupProtocolException, InvalidCommitOffsetSizeException,
> InvalidConfigurationException, InvalidFetchSizeException,
> InvalidGroupIdException, InvalidPartitionsException,
> InvalidPidMappingException, InvalidReplicaAssignmentException,
> InvalidReplicationFactorException, InvalidRequestException,
> InvalidRequiredAcksException, InvalidSessionTimeoutException,
> InvalidTimestampException, InvalidTopicException,
> InvalidTxnStateException, InvalidTxnTimeoutException,
> LeaderNotAvailableException, NetworkException, NotControllerException,
> NotCoordinatorException, NotEnoughReplicasAfterAppendException,
> NotEnoughReplicasException, NotLeaderForPartitionException,
> OffsetMetadataTooLarge, OffsetOutOfRangeException,
> OperationNotAttemptedException, OutOfOrderSequenceException,
> PolicyViolationException, ProducerFencedException,
> RebalanceInProgressException, RecordBatchTooLargeException,
> RecordTooLargeException, ReplicaNotAvailableException,
> SecurityDisabledException, TimeoutException, TopicAuthorizationException,
> TopicExistsException, TransactionCoordinatorFencedException,
> TransactionalIdAuthorizationException, UnknownMemberIdException,
> UnknownServerException, UnknownTopicOrPartitionException,
> UnsupportedForMessageFormatException, UnsupportedSaslMechanismException,
> UnsupportedVersionException]. [ClassDataAbstractionCoupling]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\protocol\Errors.java:89:1:
> Class Fan-Out Complexity is 60 (max allowed is 40).
> [ClassFanOutComplexity]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\requests\AbstractRequest.java:26:1:
> Class Fan-Out Complexity is 43 (max allowed is 40).
> [ClassFanOutComplexity]
> [ant:checkstyle] [ERROR]
> C:\Users\User\Downloads\kafka-0.11.0.0-src\clients\src\main\
> java\org\apache\kafka\common\requests\AbstractResponse.java:26:1:
> Class Fan-Out Complexity is 42 (max allowed is 40).
> [ClassFanOutComplexity]
> :clients:checkstyleMain FAILED
>
> I wonder if there is an issue with my VM since I don't get similar errors
> on Ubuntu or Mac.
>
> --Vahid
>
>
>
>
> From:   Ismael Juma <ism...@juma.me.uk>
> To: d...@kafka.apache.org, Kafka Users <users@kafka.apache.org>,
> kafka-clients <kafka-clie...@googlegroups.com>
> Date:   06/18/2017 03:32 PM
> Subject:[VOTE] 0.11.0.0 RC1
> Sent by:isma...@gmail.com
>
>
>
> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 0.11.0.0.
>
> This is a major version release of Apache Kafka. It includes 32 new KIPs.
> See
> the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.11.0.0) for more details

Re: [VOTE] 0.11.0.0 RC0

2017-06-19 Thread Tom Crayford
Hello,

Heroku has been testing 0.11.0.0 RC0, mostly focussed on backwards
compatibility and performance. So far, we note a slight performance
increase from older versions when using not-current clients

Testing a 0.9 client against 0.10.2.1 vs 0.11.0.0 rc0: 0.11.0.0 rc0 has
slightly higher throughput for both consumers and producers. We expect this
because the message format improvements should lead to greater efficiency.

We have not yet tested a 0.11.0.0 rc0 client against a 0.11.0.0 rc0
cluster, because our test setup needs updating for that.

We've tested simple demo apps against 0.11.0.0rc0 (that we've run against
older clusters):
http://github.com/heroku/heroku-kafka-demo-ruby
https://github.com/heroku/heroku-kafka-demo-node
https://github.com/heroku/heroku-kafka-demo-java
https://github.com/heroku/heroku-kafka-demo-go

This comprises a range of community supported clients: ruby-kafka,
no-kafka, the main JVM client and sarama.

We didn't see any notable issues there, but it's worth noting that all of
these demo apps do little more than produce and consume messages.

We have also tested failure handling in 0.11.0.0 rc0, by terminating nodes.
Note that this does *not* test any of the new exactly-once features, just
"can I terminate a broker whilst producing to/consuming from the cluster.
We see the same behaviour as 0.10.2.1 there, just a round of errors from
the client, like this:

org.apache.kafka.common.errors.NetworkException: The server disconnected
before a response was received.

but that's expected.

We have tested creating and deleting topics heavily, including deleting a
topic in the middle of broker failure (the controller waits for the broker
to come back before being deleted, as expected)

We have also tested upgrading a 0.10.2.1 cluster to 0.11.0.0 rc0 without
issue

We have also tested partition preferred leader election (manual, with the
admin script), and partition rebalancing to grow and shrink clusters.

We have not yet tested the exactly once features, because various core
committers said that they didn't expect this feature to be perfect in this
release. We expect to test this this week though.

Given that the blockers fixed between RC0 and RC1 haven't changed much in
the areas we tested, I think the positive results here still apply.

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jun 8, 2017 at 2:55 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for release of Apache Kafka 0.11.0.0. It's
> worth noting that there are a small number of unresolved issues (including
> documentation and system tests) related to the new AdminClient and
> Exactly-once functionality[1] that we hope to resolve in the next few days.
> To encourage early testing, we are releasing the first release candidate
> now, but there will be at least one more release candidate.
>
> Any and all testing is welcome, but the following areas are worth
> highlighting:
>
> 1. Client developers should verify that their clients can produce/consume
> to/from 0.11.0 brokers (ideally with compressed and uncompressed data).
> Even though we have compatibility tests for older Java clients and we have
> verified that librdkafka works fine, the only way to be sure is to test
> every client.
> 2. Performance and stress testing. Heroku and LinkedIn have helped with
> this in the past (and issues have been found and fixed).
> 3. End users can verify that their apps work correctly with the new
> release.
>
> This is a major version release of Apache Kafka. It includes 32 new KIPs.
> See
> the release notes and release plan (https://cwiki.apache.org/
> confluence/display/KAFKA/Release+Plan+0.11.0.0) for more details. A few
> feature highlights:
>
> * Exactly-once delivery and transactional messaging
> * Streams exactly-once semantics
> * Admin client with support for topic, ACLs and config management
> * Record headers
> * Request rate quotas
> * Improved resiliency: replication protocol improvement and single-threaded
> controller
> * Richer and more efficient message format
>
> Release notes for the 0.11.0.0 release:
> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc0/RELEASE_NOTES.html
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc0/javadoc/
>
> * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> d5ee02b187fafe08b63deb52e6b07c8d1d12f18d
>
> * Documentation:
&g

Re: [VOTE] 0.11.0.0 RC1

2017-06-19 Thread Tom Crayford
Is there a summary of which blockers were fixed in RC0 somewhere?

On Mon, Jun 19, 2017 at 1:41 PM, Eno Thereska 
wrote:

> +1 (non-binding) passes Kafka Streams tests.
>
> Thanks,
> Eno
> > On 19 Jun 2017, at 06:49, Magnus Edenhill  wrote:
> >
> > +1 (non-binding)
> >
> > Passes librdkafka integration tests (v0.9.5 and master)
> >
> >
> > 2017-06-19 0:32 GMT+02:00 Ismael Juma :
> >
> >> Hello Kafka users, developers and client-developers,
> >>
> >> This is the second candidate for release of Apache Kafka 0.11.0.0.
> >>
> >> This is a major version release of Apache Kafka. It includes 32 new
> KIPs.
> >> See
> >> the release notes and release plan (https://cwiki.apache.org/conf
> >> luence/display/KAFKA/Release+Plan+0.11.0.0) for more details. A few
> >> feature
> >> highlights:
> >>
> >> * Exactly-once delivery and transactional messaging
> >> * Streams exactly-once semantics
> >> * Admin client with support for topic, ACLs and config management
> >> * Record headers
> >> * Request rate quotas
> >> * Improved resiliency: replication protocol improvement and
> single-threaded
> >> controller
> >> * Richer and more efficient message format
> >>
> >> A number of issues have been resolved since RC0 and there are no known
> >> blockers remaining.
> >>
> >> Release notes for the 0.11.0.0 release:
> >> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/RELEASE_NOTES.html
> >>
> >> *** Please download, test and vote by Thursday, June 22, 9am PT
> >>
> >> Kafka's KEYS file containing PGP keys we use to sign the release:
> >> http://kafka.apache.org/KEYS
> >>
> >> * Release artifacts to be voted upon (source and binary):
> >> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/
> >>
> >> * Maven artifacts to be voted upon:
> >> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >>
> >> * Javadoc:
> >> http://home.apache.org/~ijuma/kafka-0.11.0.0-rc1/javadoc/
> >>
> >> * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.0 tag:
> >> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> >> 4818d4e1cbef1a8e9c027100fef317077fb3fb99
> >>
> >> * Documentation:
> >> http://kafka.apache.org/0110/documentation.html
> >>
> >> * Protocol:
> >> http://kafka.apache.org/0110/protocol.html
> >>
> >> * Successful Jenkins builds for the 0.11.0 branch:
> >> Unit/integration tests: https://builds.apache.org/job/
> >> kafka-0.11.0-jdk7/167/
> >> System tests: https://jenkins.confluent.io/
> job/system-test-kafka-0.11.0/
> >> 16/
> >> (all 274 tests passed, the reported failure was not related to the
> tests)
> >>
> >> /**
> >>
> >> Thanks,
> >> Ismael
> >>
>
>


Re: Zookeeper on same server as Kafka

2017-06-04 Thread Tom Crayford
Hi,

I would not recommend running this kind of set up in production. Busy Kafka
brokers use up a lot of disk and network bandwidth, which zookeeper does
not deal well with. This means that a burst of traffic to 1 node carries
the risk of disrupting the ZK ensemble.

Secondly, this will cause problems down the line, because you will want to
scale Kafka independently from ZK. ZK gets slower as you add nodes, but
Kafka can scale out for quite a while. For production clusters, I'd
recommend to always have 5 ZK nodes, but for Kafka, you can scale well past
that, or keep it small while you are starting out.

Thanks,

Tom Crayford
Heroku Kafka

On Sat, Jun 3, 2017 at 8:20 AM, Michal Borowiecki <
michal.borowie...@openbet.com> wrote:

> I'm not an expert but I prefer keeping zookeepers on the same hosts as
> kafka brokers and mimic each-others topology. The reason is to minimize the
> chance of e.g. kafka brokers being able to talk to one another but
> zookeepers not, or vice-versa. So, I'd say I *do* want my kafka broker
> and the co-located zookeeper to go down together - for simplicity - prefer
> that to some asymmetric failures to debug. This comes from past experience,
> albeit using other technologies, when relying on 2 different clustering
> mechanism made failures in one but not the other very difficult to debug.
>
> Also, I think I read this advice somewhere a long time ago (don't recall
> where) and it made sense to me (given the prior experience) and we've never
> tried a different arrangement.
>
> As to the overheads, I believe it's mostly disk IO and can hopefully be
> addressed by separate disks for each but it's never been a bottleneck for
> us, so can't really say.
>
> Thanks,
>
> Michał
>
> On 02/06/17 21:47, Mohammed Manna wrote:
>
> Usually, the overhead comes when you have kafka and zookeeper doing the
> housekeeping (i.e. Disk IO) on the same server. ZK even suggests that you
> should keep their logs on totally different physical machine for better
> performance. Furthermore, if a mechanical failure occurs, you might not
> want both zookeeper and broker going down together.
>
> Can anyone else chime in for some better points?
>
>
> On 2 Jun 2017 7:57 pm, "Meghana Narasimhan" <mnarasim...@bandwidth.com> 
> <mnarasim...@bandwidth.com>
> wrote:
>
> Hi,
> What are the pros and cons of setting up Zookeeper on the same server as
> the Kafka broker ? Earlier offsets were being written to zookeeper which
> was a major overhead but with offsets being written to Kafka now, what
> other requirements should be taken into consideration for setting up
> Zookeeper on the same server as Kafka vs having a separate zookeeper
> cluster ?
>
> Thanks,
> Meghana
>
>
>
> --
> <http://www.openbet.com/> Michal Borowiecki
> Senior Software Engineer L4
> T: +44 208 742 1600 <+44%2020%208742%201600>
>
>
> +44 203 249 8448 <+44%2020%203249%208448>
>
>
>
> E: michal.borowie...@openbet.com
> W: www.openbet.com
> OpenBet Ltd
>
> Chiswick Park Building 9
>
> 566 Chiswick High Rd
>
> London
>
> W4 5XT
>
> UK
> <https://www.openbet.com/email_promo>
> This message is confidential and intended only for the addressee. If you
> have received this message in error, please immediately notify the
> postmas...@openbet.com and delete it from your system as well as any
> copies. The content of e-mails as well as traffic data may be monitored by
> OpenBet for employment and security purposes. To protect the environment
> please do not print this e-mail unless necessary. OpenBet Ltd. Registered
> Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT,
> United Kingdom. A company registered in England and Wales. Registered no.
> 3134634. VAT no. GB927523612
>


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread Tom Crayford
That might be ok. If that's the case, you can probably just "precreate" all
the partitions for them upfront and avoid any worry about having to futz
with consumers.

On Tue, May 23, 2017 at 4:33 PM, David Espinosa <espi...@gmail.com> wrote:

> Thanks for the answer Tom,
> Indeed I will not have more than 10 or 20 customer per cluster, so that's
> also the maximum number of partitions possible per topic.
> Still a bad idea?
>
> 2017-05-23 16:48 GMT+02:00 Tom Crayford <tcrayf...@heroku.com>:
>
> > Hi there,
> >
> > I don't know about the consumer, but I'd *strongly* recommend not
> designing
> > your application around this. Kafka has severe and notable stability
> > concerns with large numbers of partitions, and requiring "one partition
> per
> > customer" is going to be limiting, unless you only ever expect to have
> > *very* small customer numbers (hundreds at most, ever). Instead, use a
> hash
> > function and a key, as recommended to land customers on the same
> partition.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 23, 2017 at 9:46 AM, David Espinosa <espi...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > In order to keep separated (physically) the data from different
> customers
> > > in our application, we are using a custom partitioner to drive messages
> > to
> > > a concrete partition of a topic. We know that we are loosing
> parallelism
> > > per topic this way, but our requirements regarding multitenancy are
> > higher
> > > than our throughput requirements.
> > >
> > > So, in order to increase the number of customers working on a cluster,
> we
> > > are increasing the number of partitions dinamically per topic as the
> new
> > > customer arrives using kafka AdminUtilities.
> > > Our problem arrives when using the new kafka consumer and a new
> partition
> > > is added into the topic, as this consumer doesn't get updated with the
> > "new
> > > partition" and therefore messages driven into that new partition never
> > > arrives to this consumer unless we reload the consumer itself. What was
> > > surprising was to check that using the old consumer (configured to deal
> > > with Zookeeper), a consumer does get messages from a new added
> partition.
> > >
> > > Is there a way to emulate the old consumer behaviour when new
> partitions
> > > are added in the new consumer?
> > >
> > > Thanks in advance,
> > > David
> > >
> >
>


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread Tom Crayford
Hi there,

I don't know about the consumer, but I'd *strongly* recommend not designing
your application around this. Kafka has severe and notable stability
concerns with large numbers of partitions, and requiring "one partition per
customer" is going to be limiting, unless you only ever expect to have
*very* small customer numbers (hundreds at most, ever). Instead, use a hash
function and a key, as recommended to land customers on the same partition.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 23, 2017 at 9:46 AM, David Espinosa <espi...@gmail.com> wrote:

> Hi,
>
> In order to keep separated (physically) the data from different customers
> in our application, we are using a custom partitioner to drive messages to
> a concrete partition of a topic. We know that we are loosing parallelism
> per topic this way, but our requirements regarding multitenancy are higher
> than our throughput requirements.
>
> So, in order to increase the number of customers working on a cluster, we
> are increasing the number of partitions dinamically per topic as the new
> customer arrives using kafka AdminUtilities.
> Our problem arrives when using the new kafka consumer and a new partition
> is added into the topic, as this consumer doesn't get updated with the "new
> partition" and therefore messages driven into that new partition never
> arrives to this consumer unless we reload the consumer itself. What was
> surprising was to check that using the old consumer (configured to deal
> with Zookeeper), a consumer does get messages from a new added partition.
>
> Is there a way to emulate the old consumer behaviour when new partitions
> are added in the new consumer?
>
> Thanks in advance,
> David
>


Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread Tom Crayford
Hi there,

I don't know about the consumer, but I'd *strongly* recommend not designing
your application around this. Kafka has severe and notable stability
concerns with large numbers of partitions, and requiring "one partition per
customer" is going to be limiting, unless you only ever expect to have
*very* small customer numbers (hundreds at most, ever). Instead, use a hash
function and a key, as recommended to land customers on the same partition.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 23, 2017 at 9:46 AM, David Espinosa <espi...@gmail.com> wrote:

> Hi,
>
> In order to keep separated (physically) the data from different customers
> in our application, we are using a custom partitioner to drive messages to
> a concrete partition of a topic. We know that we are loosing parallelism
> per topic this way, but our requirements regarding multitenancy are higher
> than our throughput requirements.
>
> So, in order to increase the number of customers working on a cluster, we
> are increasing the number of partitions dinamically per topic as the new
> customer arrives using kafka AdminUtilities.
> Our problem arrives when using the new kafka consumer and a new partition
> is added into the topic, as this consumer doesn't get updated with the "new
> partition" and therefore messages driven into that new partition never
> arrives to this consumer unless we reload the consumer itself. What was
> surprising was to check that using the old consumer (configured to deal
> with Zookeeper), a consumer does get messages from a new added partition.
>
> Is there a way to emulate the old consumer behaviour when new partitions
> are added in the new consumer?
>
> Thanks in advance,
> David
>


Re: Data loss after a Kafka broker restart scenario.

2017-05-17 Thread Tom Crayford
Fathima,

In 0.11 there will be such a mechanism (see KIP-98), but in current
versions, you have to eat the duplicates if you want to not lose messages.

On Wed, May 17, 2017 at 5:31 AM, Fathima Amara  wrote:

> Hi Mathieu,
>
> Thanks for replying. I've already tried by setting "retries" to higher
> values(maximum up to 3). Since this introduces duplicates which I do not
> require, I brought the "retries" value back to 0. I would like to know
> whether there is a way to achieve "exactly-once" guarantee having increased
> the retires value?
>
> Fathima
>


Re: Log compaction failed because offset map doesn't have enough space

2017-05-17 Thread Tom Crayford
Hi,

You should upgrade Kafka versions, this was a bug fixed in KAFKA-3894:
https://issues.apache.org/jira/browse/KAFKA-3894

Generally it's a very good idea to keep on top of Kafka version upgrades,
there are numerous bugs fixed with every release, and it's stability goes
up each time.

On Tue, May 16, 2017 at 11:20 PM, Jun Ma  wrote:

> Hi team,
>
> We are having a issue with compacting __consumer_offsets topic in our
> cluster. We’re seeing logs in log-cleaner.log saying:
>
> [2017-05-16 11:56:28,993] INFO Cleaner 0: Building offset map for log
> __consumer_offsets-15 for 349 segments in offset range [0, 619265471).
> (kafka.log.LogCleaner)
> [2017-05-16 11:56:29,014] ERROR [kafka-log-cleaner-thread-0], Error due to
>  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: 306088059 messages
> in segment __consumer_offsets-15/.log but offset map
> can fit only 7499. You can increase log.cleaner.dedupe.buffer.size or
> decrease log.cleaner.threads
> at scala.Predef$.require(Predef.scala:219)
> at kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584)
> at kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580)
> at
> scala.collection.immutable.Stream$StreamWithFilter.
> foreach(Stream.scala:570)
> at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580)
> at kafka.log.Cleaner.clean(LogCleaner.scala:322)
> at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-05-16 11:56:29,016] INFO [kafka-log-cleaner-thread-0], Stopped
>  (kafka.log.LogCleaner)
>
> We have log.cleaner.dedupe.buffer.size=20, which is slightly less
> than 2G, but still, it can only fit 74,999,999 messages. The segment has
> 306,088,059 messages which is 4 times larger than the buffer can hold. We
> tried to set log.cleaner.dedupe.buffer.size even larger, but we see the log
> saying that
> [2017-05-16 11:52:16,238] WARN [kafka-log-cleaner-thread-0], Cannot use
> more than 2G of cleaner buffer space per cleaner thread, ignoring excess
> buffer space... (kafka.log.LogCleaner)
>
> The size of .log segment is 100MB, and
> log.cleaner.threads=1. We’re running Kafka 0.9.0.1.
> How can we get through this?
>
> Thanks,
> Jun
>


Re: [kafka-clients] [VOTE] 0.10.2.0 RC2

2017-02-16 Thread Tom Crayford
+1 (non-binding)

I didn't explicitly state my voting status in my previous comment, sorry.

On Thu, Feb 16, 2017 at 1:59 PM, Rajini Sivaram 
wrote:

> +1 (non-binding)
>
> Ran quick start and some security tests on binary, checked source build and
> tests.
>
> Thank you,
>
> Rajini
>
> On Thu, Feb 16, 2017 at 2:04 AM, Jun Rao  wrote:
>
> > Hi, Ewen,
> >
> > Thanks for running the release. +1. Verified quickstart on 2.10 binary.
> >
> > Jun
> >
> > On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava <
> e...@confluent.io
> > >
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the third candidate for release of Apache Kafka 0.10.2.0.
> > >
> > > This is a minor version release of Apache Kafka. It includes 19 new
> KIPs.
> > > See the release notes and release plan (https://cwiki.apache.org/conf
> > > luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> > > feature highlights: SASL-SCRAM support, improved client compatibility
> to
> > > allow use of clients newer than the broker, session windows and global
> > > tables in the Kafka Streams API, single message transforms in the Kafka
> > > Connect framework.
> > >
> > > Important note: in addition to the artifacts generated using JDK7 for
> > > Scala 2.10 and 2.11, this release also includes experimental artifacts
> > > built using JDK8 for Scala 2.12.
> > >
> > > Important code changes since RC1 (non-docs, non system tests):
> > >
> > > KAFKA-4756; The auto-generated broker id should be passed to
> > > MetricReporter.configure
> > > KAFKA-4761; Fix producer regression handling small or zero batch size
> > >
> > > Release notes for the 0.10.2.0 release:
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by February 17th 5pm ***
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
> > >
> > > * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> > > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> > > 5712b489038b71ed8d5a679856d1dfaa925eadc1
> > >
> > >
> > > * Documentation:
> > > http://kafka.apache.org/0102/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/0102/protocol.html
> > >
> > > * Successful Jenkins builds for the 0.10.2 branch:
> > > Unit/integration tests: https://builds.apache.org/job/
> > > kafka-0.10.2-jdk7/77/
> > > System tests: https://jenkins.confluent.io/
> job/system-test-kafka-0.10.2/
> > > 29/
> > >
> > > /**
> > >
> > > Thanks,
> > > Ewen
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "kafka-clients" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to kafka-clients+unsubscr...@googlegroups.com.
> > > To post to this group, send email to kafka-clie...@googlegroups.com.
> > > Visit this group at https://groups.google.com/group/kafka-clients.
> > > To view this discussion on the web visit https://groups.google.com/d/
> > > msgid/kafka-clients/CAE1jLMORScgr1RekNgY0fLykSPh_%
> > > 2BgkRYN7vok3fz1ou%3DuW3kw%40mail.gmail.com
> > >  > CAE1jLMORScgr1RekNgY0fLykSPh_%2BgkRYN7vok3fz1ou%3DuW3kw%
> > 40mail.gmail.com?utm_medium=email_source=footer>
> > > .
> > > For more options, visit https://groups.google.com/d/optout.
> > >
> >
>


Re: [VOTE] 0.10.2.0 RC2

2017-02-15 Thread Tom Crayford
Heroku tested this with our usual round of performance benchmarks, and
there seem to be no notable regressions in this RC that we can see (for a
sample on earlier regressions we found using these benchmarks during the
0.10.0.0 release,
https://engineering.heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-performance-in-distributed-systems/
is a decent writeup)

On Tue, Feb 14, 2017 at 6:39 PM, Ewen Cheslack-Postava 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> feature
> highlights: SASL-SCRAM support, improved client compatibility to allow use
> of clients newer than the broker, session windows and global tables in the
> Kafka Streams API, single message transforms in the Kafka Connect
> framework.
>
> Important note: in addition to the artifacts generated using JDK7 for Scala
> 2.10 and 2.11, this release also includes experimental artifacts built
> using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.10.2-jdk7/77/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/
> 29/
>
> /**
>
> Thanks,
> Ewen
>


Re: [VOTE] Vote for KIP-101 - Leader Epochs

2017-01-04 Thread Tom Crayford
+1

On Wed, Jan 4, 2017 at 5:28 PM, Gwen Shapira  wrote:

> +1 - thanks for tackling those old and painful bugs!
>
> On Wed, Jan 4, 2017 at 9:24 AM, Ben Stopford  wrote:
> > Hi All
> >
> > We’re having some problems with this thread being subsumed by the
> [Discuss] thread. Hopefully this one will appear distinct. If you see more
> than one, please use this one.
> >
> > KIP-101 should now be ready for a vote. As a reminder the KIP proposes a
> change to the replication protocol to remove the potential for replicas to
> diverge.
> >
> > The KIP can be found here:  https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-101+-+Alter+Replication+
> Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+
> rather+than+High+Watermark+for+Truncation>
> >
> > Please let us know your vote.
> >
> > B
> >
> >
> >
> >
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: ActiveControllerCount is always will be either 0 or 1 in 3 nodes kafka cluster?

2016-12-12 Thread Tom Crayford
This is confluent documentation, not Apache documentation. I'd recommend
talking to Confluent about that.

On Mon, Dec 12, 2016 at 4:57 AM, Sven Ludwig  wrote:

> Hi,
>
> in JMX each Kafka broker has a value 1 or 0 for ActiveControllerCount. As
> I understood from this thread, the sum of these values across the cluster
> should never be something other than 1. The documentation at
> http://docs.confluent.io/3.1.0/kafka/monitoring.html should be improved
> to make that clear. Currently it is misleading:
>
> kafka.controller:type=KafkaController,name=ActiveControllerCount
> Number of active controllers in the cluster. Alert if value is anything
> other than 1.
>
> Suggested:
>
> kafka.controller:type=KafkaController,name=ActiveControllerCount
> Number of active controllers on a broker. Alert if the aggregated sum
> across all brokers in the cluster is anything other than 1, because in a
> cluster there should only be one broker with an active controller (cluster
> singleton).
>
> Kind Regards,
> Sven
>


Re: offset topics growing huge

2016-10-07 Thread Tom Crayford
On Mon, Oct 3, 2016 at 5:38 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:

> Newbie question, but what exactly does log.cleaner.enable=true do, and how
> do I know if I need to set it to be true?
>

If you're using any compacted topics (including __consumer_offsets), it
needs to be on.


>
> Also, if config changes like that need to be made once a cluster is up and
> running, what's the recommended way to do that? Do you killall -12 kafka
> and then make the change, and then start kafka again, one broker at a time?
>

Roughly yes. You should probably orchestrate it and run e.g. health checks
and such before restarting the next node.

Thanks

Tom Crayford
Heroku Kafka


>
> On Mon, Oct 3, 2016 at 9:27 PM, Tom Crayford <tcrayf...@heroku.com> wrote:
>
> > Yes, offset topic compaction is just the normal compaction.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Monday, 3 October 2016, Tobias Adamson <tob...@stargazer.com.sg>
> wrote:
> >
> > > Hi
> > > We are using Kafka 0.10.1 with offsets commits being stored inside of
> > Kafka
> > > After a while these topics become extremely large and we are wondering
> if
> > > we
> > > need to enable log.cleaner.enable=true (currently false) to make sure
> the
> > > internal
> > > offset topics get compacted and keep their size down?
> > >
> > > Regards
> > > T
> >
>


Re: Delayed Queue usecase

2016-10-07 Thread Tom Crayford
Kafka doesn't support time delays at all, no.

On Thu, Oct 6, 2016 at 12:14 AM, Akshay Joglekar <
akshay.jogle...@danalinc.com> wrote:

> Hi,
>
> I have a use case where I need to process certain messages only after a
> certain amount time has elapsed. Does Kafka have any support for time
> delays?
> Currently I am putting messages in different queues based on when the
> message should get processed and at any given time the consumers only poll
> the queue whose time is current. However this does not scale very well and
> it's hard to provide finer second-level or millisecond level granularity
> since the number of queues required becomes huge at that point. So was
> wondering if Kafka provides any built-in mechanism for time delays that can
> be used.
>
> Thanks,
> Akshay
>


Re: offset topics growing huge

2016-10-03 Thread Tom Crayford
Yes, offset topic compaction is just the normal compaction.

Thanks

Tom Crayford
Heroku Kafka

On Monday, 3 October 2016, Tobias Adamson <tob...@stargazer.com.sg> wrote:

> Hi
> We are using Kafka 0.10.1 with offsets commits being stored inside of Kafka
> After a while these topics become extremely large and we are wondering if
> we
> need to enable log.cleaner.enable=true (currently false) to make sure the
> internal
> offset topics get compacted and keep their size down?
>
> Regards
> T


Re: Zookeeper hostname/ip change

2016-09-26 Thread Tom Crayford
You'll need to do a rolling restart of your kafka nodes after changing the
zookeeper ensemble. There's no real way around that right now.

On Sun, Sep 25, 2016 at 6:41 PM, Ali Akhtar  wrote:

> Perhaps if you add 1 node, take down existing node, etc?
>
> On Sun, Sep 25, 2016 at 10:37 PM, brenfield111 
> wrote:
>
> > I need to change the hostnames and ips for the Zookeeper ensemble
> > serving my Kafka cluster.
> >
> > Will Kafka carry on as usual, along with it's existing ZK nodes, after
> > making the config changes?
> >
> > Thanks
> >
>


Re: Using kafka as a "message bus" for an event store

2016-09-06 Thread Tom Crayford
inline

On Mon, Sep 5, 2016 at 11:58 PM, F21 <f21.gro...@gmail.com> wrote:

> Hi Tom,
>
> Thank you so much for your response. I had a feeling that approach would
> run into scalability problems, so thank you for confirming that.
>
> Another approach would be to have each service request a subscription from
> the event store. The event store then creates a unique kafka topic for each
> service. If multiple instances of a service requests a subscription, the
> event store should only create the topic once and return the name of the
> topic to the service.
>

How many topics would you expect to have in this approach? Kafka has
similar limits on number of topics as it does on partitions (in fact
partitions drives the topics, since all topics must have at least one
partition).


>
> A reader/writer would read from HBase and push new messages into each
> topic.
>
> In this case, I would set my topics to retain message for, say, 5 days in
> the event that a service goes down and we need to bring it back up.
>
> The event store would also query kafka to see which topics have not been
> read from for say 30 days and delete them. This would be for cases where a
> service is decommissioned. Does kafka provide a way to check when the topic
> was last read from?
>

Not last read from, no. You can track the last produced message by looking
at the latest message and it's timestamp (timestamps were added in 0.10).
I'd recommend tracking reads somewhere else, but it may be somewhat
difficult. You could also potentially use consumer offsets for this - if
your consumer is storing offsets in Kafka anyway.

Thanks

Tom Crayford
Heroku Kafka


>
> Does this sound like a saner way?
>
> Cheers,
> Francis
>
>
> On 5/09/2016 11:00 PM, Tom Crayford wrote:
>
>> inline
>>
>> On Mon, Sep 5, 2016 at 12:00 AM, F21 <f21.gro...@gmail.com> wrote:
>>
>> Hi all,
>>>
>>> I am currently looking at using Kafka as a "message bus" for an event
>>> store. I plan to have all my events written into HBase for permanent
>>> storage and then have a reader/writer that reads from HBase to push them
>>> into kafka.
>>>
>>
>> In terms of kafka, I plan to set it to keep all messages indefinitely.
>>> That way, if any consumers need to rebuild their views or if new
>>> consumers
>>> are created, they can just read from the stream to rebuild the views.
>>>
>>> Kafka isn't designed at all for permanent message storage, except for
>> compacted topics. I suggest you rethink this, unless compacted topics work
>> for you (Kafka is not designed to keep unbounded amounts of data for
>> unbounded amounts of time, simply to provide messaging and replay over
>> short, bounded windows).
>>
>>
>> I plan to use domain-driven design and will use the concept of aggregates
>>> in the system. An example of an aggregate might be a customer. All events
>>> for a given aggregate needs to be delivered in order. In the case of
>>> kafka,
>>> I would need to over partition the system by a lot, as any changes in the
>>> number of partitions could result in messages that were bound for a given
>>> partition being pushed into a newly created partition. Are there any
>>> issues
>>> if I create a new partition every time an aggregate is created? In a
>>> system
>>> with a large amount of aggregates, this will result in millions or
>>> hundreds
>>> of millions of partitions. Will this cause performance issues?
>>>
>>> Yes.
>>
>> Kafka is designed to support hundreds to thousands of partitions per
>> machine, not millions (and there is an upper bound per cluster which is
>> well below one million). I suggest you rethink this and likely use a
>> standard "hash based partitioning" scheme.
>>
>>
>> Cheers,
>>>
>>> Francis
>>>
>>>
>>>
>


Re: Producer/Consumer config for length Kafka broker maintenance

2016-09-06 Thread Tom Crayford
Hi there,

If you're only shutting down a single broker of many and you have a
replication factor more than 1, those consumer and producer configs should
handle it. However, if you have only a single broker, I'd recommend getting
some replication in before doing any maintenance work - Kafka is really
designed to operate with replication.

Thanks

Tom Crayford
Heroku Kafka

On Tue, Sep 6, 2016 at 5:30 AM, Harald Kirsch <harald.kir...@raytion.com>
wrote:

> Hi all,
>
> there are so many timeouts to tweak mentioned in the documentation that I
> wonder what the correct configuration for producer and consumer is to
> survive a, say, 1 hour, broker shutdown.
>
> With "survive" I mean that the processes are idle or blocked and keep
> trying to send their data, and just pick up work shortly after the broker
> appears again.
>
> I have the following suspects to tweak:
>
> PRODUCER:
> connections.max.idle.ms: (9 minutes) Does a lost connection count as idle?
>
> max.block.ms: (default 1 minute) Seems a definite candidate to raise to 1
> hour.
>
> request.timeout.ms: (default 1/2 minute)
> metadata.fetch.timeout.ms: (default 1 minute)
> retry.backoff.ms: (default 100ms)
>
> CONSUMER:
> connections.max.idle.ms: (same as above I guess)
> request.timeout.ms
>
> What would be a good combination of settings?
>
> Harald.
>


Re: Kafka bootup exception while recovering log file

2016-09-06 Thread Tom Crayford
This sounds like Kafka not being entirely robust to disk corruption, which
seems entirely possible and normal. I'd simply delete that log file and let
a replica replay catch it up at broker bootup.

Trying to guard against all possible disk corruption bugs sounds very
difficult to me, it seems better to let the operator handle corruption on a
case by case basis.

On Tue, Sep 6, 2016 at 7:14 AM, Jaikiran Pai 
wrote:

> I'm not from the Kafka dev team so I won't be able to comment whether this
> is an expected way to fail or if this needs to be handled in a more
> cleaner/robust manner (at least very least probably a better exception
> message). Since you have put in efforts to write a test case and narrow it
> down to this specific flow, maybe you can send a mail to their dev mailing
> list and/or maybe create a JIRA to report this.
>
> -Jaikiran
>
>
> On Tuesday 30 August 2016 12:07 PM, Gaurav Agarwal wrote:
>
>> Kafka version: 0.10.0
>>
>> Exception Trace
>> 
>> java.util.NoSuchElementException
>> at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:37)
>> at kafka.log.LogSegment.recover(LogSegment.scala:189)
>> at kafka.log.Log.recoverLog(Log.scala:268)
>> at kafka.log.Log.loadSegments(Log.scala:243)
>> at kafka.log.Log.(Log.scala:101)
>> at kafka.log.LogTest.testCorruptLog(LogTest.scala:830)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
>> FrameworkMethod.java:50)
>> at
>> org.junit.internal.runners.model.ReflectiveCallable.run(Refl
>> ectiveCallable.java:12)
>> at
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(Fr
>> ameworkMethod.java:47)
>> at
>> org.junit.internal.runners.statements.InvokeMethod.evaluate(
>> InvokeMethod.java:17)
>> at
>> org.junit.internal.runners.statements.RunBefores.evaluate(
>> RunBefores.java:26)
>> at
>> org.junit.internal.runners.statements.RunAfters.evaluate(Run
>> Afters.java:27)
>> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>> at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit
>> 4ClassRunner.java:78)
>> at
>> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit
>> 4ClassRunner.java:57)
>> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>> at
>> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs
>> (JUnit4IdeaTestRunner.java:117)
>> at
>> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs
>> (JUnit4IdeaTestRunner.java:42)
>> at
>> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsA
>> ndStart(JUnitStarter.java:262)
>> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStart
>> er.java:84)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
>>
>> Test Code (same exception trace is see in broker logs as well on prod
>> machines with exactly the same lof files as given in this mini test)
>> -
>>
>> val logProps = new Properties()
>> logProps.put(LogConfig.MaxMessageBytesProp, 15 * 1024 * 1024:
>> java.lang.Integer)
>> val config = LogConfig(logProps)
>> val cp = new File("/Users/gaurav/Downloads/corrupt/gaurav/kafka-logs/Topi
>> c3-12")
>> var log = new Log(cp, config, 0, time.scheduler, time
>>
>>
>> On Tue, Aug 30, 2016 at 11:37 AM, Jaikiran Pai 
>> wrote:
>>
>> Can you paste the entire exception stacktrace please?
>>>
>>> -Jaikiran
>>>
>>> On Tuesday 30 August 2016 11:23 AM, Gaurav Agarwal wrote:
>>>
>>> Hi there, just wanted to bump up the thread one more time to check if
 someone can point us in the right direction... This one was quite a
 serious
 failure that took down many of our kafka brokers..

 On Sat, Aug 27, 2016 at 2:11 PM, Gaurav Agarwal <
 gauravagarw...@gmail.com
 wrote:

 Hi All,

> We are facing a weird problem where Kafka broker fails to start due to
> an
> unhandled exception while 'recovering' a log segment. I have been able
> to
> isolate the 

Re: Authorization with Topic Wildcards

2016-09-05 Thread Tom Crayford
if you're running that at a bash or similar shell, you need to quote the
"*" so that bash doesn't expand it as a glob:

./kafka-acls.sh --authorizer-properties zookeeper.connect=
--add --allow-principal User:"user01"   --topic 'com.domain.xyz.*' --group
group01 --operation read

It may be instructive to look at what data is in zookeeper for the acls to
debug this.

On Mon, Sep 5, 2016 at 7:38 PM, Derar Alassi  wrote:

> Hi all,
>
> Although the documentation mentions that one can use wildcards with topic
> ACLs, I couldn't get that to work. Essentially, I want to set an Allow
> Read/Write ACL on topics com.domain.xyz.* to a certain user. This would
> give this user Read/Write access to topics com.domain.xyz.abc and
> com.domain.xyz.def .
>
> I set an ACL using this command:
> ./kafka-acls.sh --authorizer-properties zookeeper.connect=
> --add --allow-principal User:"user01"   --topic com.domain.xyz.* --group
> group01 --operation read
>
> When I try to consume from the topic com.domain.xyz.abc  using the same
> user ID and group, I get NOT_AUTHORIZED error.
>
> Anything I am missing?
>
> Thanks,
> Derar
>


Re: Using kafka as a "message bus" for an event store

2016-09-05 Thread Tom Crayford
inline

On Mon, Sep 5, 2016 at 12:00 AM, F21  wrote:

> Hi all,
>
> I am currently looking at using Kafka as a "message bus" for an event
> store. I plan to have all my events written into HBase for permanent
> storage and then have a reader/writer that reads from HBase to push them
> into kafka.


> In terms of kafka, I plan to set it to keep all messages indefinitely.
> That way, if any consumers need to rebuild their views or if new consumers
> are created, they can just read from the stream to rebuild the views.
>

Kafka isn't designed at all for permanent message storage, except for
compacted topics. I suggest you rethink this, unless compacted topics work
for you (Kafka is not designed to keep unbounded amounts of data for
unbounded amounts of time, simply to provide messaging and replay over
short, bounded windows).


>
> I plan to use domain-driven design and will use the concept of aggregates
> in the system. An example of an aggregate might be a customer. All events
> for a given aggregate needs to be delivered in order. In the case of kafka,
> I would need to over partition the system by a lot, as any changes in the
> number of partitions could result in messages that were bound for a given
> partition being pushed into a newly created partition. Are there any issues
> if I create a new partition every time an aggregate is created? In a system
> with a large amount of aggregates, this will result in millions or hundreds
> of millions of partitions. Will this cause performance issues?
>

Yes.

Kafka is designed to support hundreds to thousands of partitions per
machine, not millions (and there is an upper bound per cluster which is
well below one million). I suggest you rethink this and likely use a
standard "hash based partitioning" scheme.


>
> Cheers,
>
> Francis
>
>


Re: Retention on compacted topics

2016-09-01 Thread Tom Crayford
It is not applied. An upcoming release will have the ability to combine the
two, but right now they are mutually exclusive.

On Thu, Sep 1, 2016 at 6:18 PM, David Yu  wrote:

> Hi,
>
> Does Kafka "log.retention.bytes" or "log.retention.ms" apply to compaction
> enabled topic? I'll be surprised it does not, since this means that a
> compacted topic will potentially grow unbounded if deletion is not
> happening quickly enough (if we do deletion at all).
>
> Thanks,
> David
>


Re: kafka broker is dropping the messages after acknowledging librdkafka

2016-08-11 Thread Tom Crayford
Are you running with unclean leader election on? Are you setting min in
sync replicas at all?

Can you attach controller and any other logs from the brokers you have?
They would be crucial in debugging this kind of issue.

Thanks

Tom Crayford
Heroku Kafka

On Thursday, 11 August 2016, Mazhar Shaikh <mazhar.shaikh...@gmail.com>
wrote:

> Hi Kafka Team,
>
> I'm using kafka (kafka_2.11-0.9.0.1) with librdkafka (0.8.1) API for
> producer
> During a run of 2hrs, I notice the total number of messaged ack'd by
> librdkafka delivery report is greater than the maxoffset of a partition in
> kafka broker.
> I'm running kafka broker with replication factor of 2.
>
> Here, message has been lost between librdkafka - kafka broker.
>
> As librdkafka is providing success delivery report for all the messages.
>
> Looks like kafka broker is dropping the messages after acknowledging
> librdkafka.
>
> Requesting you help in solving this issue.
>
> Thank you.
>
>
> Regards
> Mazhar Shaikh
>


Re: Deleting by writing null payload not working in compacted logs

2016-08-10 Thread Tom Crayford
I'd possibly check the log segments in question, by using the
DumpLogSegments tool.

Note that Kafka keeps a deleted tombstone for 24h by default after a
deletion. Are you checking for the key and the value being present during
this testing?

Sample code for the producer messages would be useful as well.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 10 August 2016, Christiane Lemke <christiane.le...@gmail.com>
wrote:

> Hi all,
>
> I am trying to set up a minimal example understanding log compaction
> behaviour using kafka-clients-0.10.0.0.jar. I got the compaction behaviour
> working fine, however, when trying to delete a message explicitly by
> writing a null value, the message seems to not be deleted.
>
> These are my settings for my topic for compaction and deletion kicking in
> as soon as possible: Configs:min.cleanable.dirty.ratio=0.01,
> delete.retention.ms=100,retention.ms=100,segment.ms
> =100,cleanup.policy=compact
>
> The offending consumer record looks like this:
> [ConsumerRecord(topic = compation-test, partition = 0, offset = 33,
> CreateTime = 1470812816735, checksum = 3859648886, serialized key size =
> 16, serialized value size = -1, key = 9c9bde71-29ec-4687-ab24-
> 9459f5fc0d34,
> value = null)]
>
> I can see the cleaner threads running fine, producing output like this:
> [2016-08-10 08:24:51,601] INFO Cleaner 0: Cleaning segment 37 in log
> tns_ticket-0 (last modified Wed Aug 10 07:52:26 UTC 2016) into 0, retaining
> deletes. (kafka.log.LogCleaner)
> (retaining deletes?)
>
> I am running out of ideas of settings to try - are there any ideas about
> what I might have missed or misunderstood?
>
> Any hint greatly appreciated :)
>
> Best, Christiane
>


Re: [VOTE] 0.10.0.1 RC2

2016-08-05 Thread Tom Crayford
Heroku has tested this using the same performance testing setup we used to
evaluate the impact of 0.9 -> 0.10 (see https://engineering.
heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-
performance-in-distributed-systems/).

We see no issues at all with them, so +1 (non-binding) from here.

On Fri, Aug 5, 2016 at 12:58 PM, Jim Jagielski  wrote:

> Looks good here: +1
>
> > On Aug 4, 2016, at 9:54 AM, Ismael Juma  wrote:
> >
> > Hello Kafka users, developers and client-developers,
> >
> > This is the third candidate for the release of Apache Kafka 0.10.0.1.
> This
> > is a bug fix release and it includes fixes and improvements from 53 JIRAs
> > (including a few critical bugs). See the release notes for more details:
> >
> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/RELEASE_NOTES.html
> >
> > When compared to RC1, RC2 contains a fix for a regression where an older
> > version of slf4j-log4j12 was also being included in the libs folder of
> the
> > binary tarball (KAFKA-4008). Thanks to Manikumar Reddy for reporting the
> > issue.
> >
> > *** Please download, test and vote by Monday, 8 August, 8am PT ***
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > http://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging
> >
> > * Javadoc:
> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/javadoc/
> >
> > * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc2 tag:
> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> f8f56751744ba8e55f90f5c4f3aed8c3459447b2
> >
> > * Documentation:
> > http://kafka.apache.org/0100/documentation.html
> >
> > * Protocol:
> > http://kafka.apache.org/0100/protocol.html
> >
> > * Successful Jenkins builds for the 0.10.0 branch:
> > Unit/integration tests: *https://builds.apache.org/job
> /kafka-0.10.0-jdk7/182/
> > *
> > System tests: *https://jenkins.confluent.io/
> job/system-test-kafka-0.10.0/138/
> > *
> >
> > Thanks,
> > Ismael
>
>


Re: Leader not available when using ACLs on Kafka 0.10

2016-08-05 Thread Tom Crayford
Hi,

I'd recommend turning up broker logs to DEBUG and looking at the
controller's logs. The controller talks to nodes over the network and if it
can't reach them because of ACLs, then you won't get a leader.

The only other note is to check if your brokers are talking to each other
over TLS or plaintext. If they're going over plaintext you'll need to
authenticate those hosts. If they're going over TLS, you'll need to ensure
they're using the right client certs.

Thanks

Tom Crayford
Heroku Kafka

On Friday, 5 August 2016, Wannes De Smet <wannes...@gmail.com> wrote:

> Hi all
>
> We are getting 'Leader not available' exception' when using ACLs with TLS
> on a three node Kafka cluster, configured as [1]. The error occurs both
> when trying to produce and consume from a topic, to which the producer
> principal and all hosts have been granted access for testing, using the
> following:
>
> ./kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer
> --authorizer-properties zookeeper.connect=localhost:2181 --add
> --allow-principal User:* --producer --topic topicName
>
> The same issue appears in another thread on this mailing list [2], though
> no information is present on how to resolve this issue. We also tried using
> 0.10.0.1 RC2, unfortunately to no effect. When the ACLs are not active,
> everything works as expected.
>
> Another attempt to explicitly allow access to all Kafka cluster hosts with
> the 'All' principal did not have any effect.
>
> Please advise how we might debug and resolve this issue.
>
> Thanks
> Wannes
>
> [1] listeners=PLAINTEXT://:9092,SSL://:9093 ; inter-broker communication
> is
> using the PLAINTEXT default
> [2]
> http://mail-archives.apache.org/mod_mbox/kafka-users/201608.
> mbox/%3CCANZ-JHHmL_E5xhcEdHeW0ZYME+M8iZsaz-D59UKL8HeWh3=PSw@
> mail.gmail.com%3E
>


Re: Compress before or after serialization?

2016-08-05 Thread Tom Crayford
Kafka compresses batches in the producer before sending them to the broker.
You'll get notably better compression from this than you will from per
message compression. I'd recommend checking your producer config and maybe
looking at the log segments on the broker with DumpLogSegments.

If you have sample producer configs and code, maybe this would help in
diagnosing the issue.

Thanks

Tom Crayford
Heroku Kafka

On Friday, 5 August 2016, David Yu <guans...@gmail.com> wrote:

> We are using Avro as our format when writing to a Kafka topic. After we
> enabled snappy compression on the producer, we don't see a change in the
> compression ratio (still 1). I was wondering if we should compress the
> message before serialization.
>
> Thanks,
> David
>


Re: Kafka compaction that can aggregate/count messages by key?

2016-08-05 Thread Tom Crayford
Kafka can't by itself do aggregation (nor does it really make sense for it
to). You can build such a feature on top of log compaction relatively
easily (by sending the new count as a message under an individual key), or
you can use the KTable and aggregation features of Kafka Streams.

Thanks

Tom Crayford
Heroku Kafka

On Friday, 5 August 2016, R Krishna <krishna...@gmail.com> wrote:

> Is it possible to use Kafka to track counts instead of deletion on
> compaction? I know we can aggregate ourself and add it to a different topic
> but that won't make sense if the time window is more than few seconds.
>
> Say, I can then, use it to count based on a key containing minute, hour,
> day.
> https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
>
> --
> Krishna
>


Re: Using automatic brokerId generation

2016-08-03 Thread Tom Crayford
You have to run kafka-reassign-partitions.sh script to move partitions to a
new replica id.

On Wed, Aug 3, 2016 at 3:14 AM, Digumarthi, Prabhakar Venkata Surya <
prabhakarvenkatasurya.digumar...@capitalone.com> wrote:

> Hi ,
>
>
> I am right now using kafka version 0.9.1.0
>
> If I choose to enable automatic brokerId generation, and let’s say if one
> of my broker dies and a new broker gets started with a different brokerId.
> Is there a way I can get the new broker Id part of the replica set of a
> partition automatically?  Or is that I need to run the
> kafka-reassign-partitions.sh script for that to happen?
>
> Thanks,
> PD
>
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Re: Error: Leader Not Available

2016-08-01 Thread Tom Crayford
Hi there,

What version of Kafka are you using? Can you share your config files and
any sample code?

Thanks

Tom Crayford
Heroku Kafka

On Monday, 1 August 2016, Benny Ho <blazebenn...@yahoo.com.invalid> wrote:

> Hello,
> I'm receiving an error while publishing messages to a kafka topic, the
> steps I took were:1. Starting zookeeper server2. Starting kafka server3.
> Sending messages to kafka topic with a Kafka Producer
> 8105 [kafka-producer-network-thread | producer-2] DEBUG
> org.apache.kafka.clients.NetworkClient  - Sending metadata request
> {topics=[default-topic]} to node -1
> 8108 [kafka-producer-network-thread | producer-1] DEBUG
> org.apache.kafka.clients.NetworkClient  - Sending metadata request
> {topics=[default-topic-raw]} to node -18109 [kafka-producer-network-thread
> | producer-2] WARN  org.apache.kafka.clients.NetworkClient  - Error while
> fetching metadata with correlation id 9 :
> {default-topic=LEADER_NOT_AVAILABLE}8112 [kafka-producer-network-thread |
> producer-1] WARN  org.apache.kafka.clients.NetworkClient  - Error while
> fetching metadata with correlation id 10 :
> {default-topic-raw=LEADER_NOT_AVAILABLE}8213 [kafka-producer-network-thread
> | producer-2] DEBUG org.apache.kafka.clients.NetworkClient  - Sending
> metadata request {topics=[default-topic]} to node -18216
> [kafka-producer-network-thread | producer-1] DEBUG
> org.apache.kafka.clients.NetworkClient  - Sending metadata request
> {topics=[default-topic-raw]} to node -1
> The two topics I attempted to write to were default-topic and
> default-topic-raw.
> I've heard solutions of changing advertised.listeners in the
> server.properties file to local or my local IP, but it didn't solve the
> problem.
> Benny Ho
>


Re: __consumer_offsets rebalance

2016-07-21 Thread Tom Crayford
Hi there,

It's enabled with the config log.cleaner.enable

Thanks

On Wed, Jul 20, 2016 at 5:29 PM, Anderson Goulart <
anderson.goul...@boxever.com> wrote:

> Hi,
>
> How can I see if log compaction is enabled? And how can I enable it? I
> didn't find it on kafka docs.
>
>
> Thanks, Anderson
>
>
>
> On 14/07/2016 13:37, Todd Palino wrote:
>
>> It's safe to move the partitions of the offsets topic around. You'll move
>> the consumer coordinators as you do, however, so the one thing you want to
>> make sure of, especially running an older version, is that log compaction
>> is working on your brokers and those partitions have been compacted. The
>> coordinator needs to bootstrap the topic, and if log compaction is broken
>> that can take a very long time. During that time, it will return errors to
>> consumers for offset operations, and that can cause offset resets.
>>
>> -Todd
>>
>> On Thursday, July 14, 2016, Anderson Goulart <
>> anderson.goul...@boxever.com>
>> wrote:
>>
>> Hi,
>>>
>>> I am running kafka 0.8.2.1 under aws instances with multiple availability
>>> zones. As we want a rack aware partition replication, we have our own
>>> partition layout distribution, to make sure all partitions are well
>>> balanced between nodes, leaders and availability zones.
>>>
>>> The problem arises with __consumer_offsets internal topic. In our current
>>> environment it has 50 partitions and are all under the same AZ, with an
>>> unbalanced leader (all wrong!)
>>>
>>> The question is: should I manually change its partition layout
>>> distribution as I do for the other topics? Is it safe to reassign the new
>>> layout for this internal topic, using kafka-reassign-partitions.sh?
>>>
>>>
>>> Thanks, Anderson
>>>
>>>
>>
>


Re: Consumer Offsets and Open FDs

2016-07-19 Thread Tom Crayford
Manikumar,

How will that help? Increasing the number of log cleaner threads will lead
to *less* memory for the buffer per thread, as it's divided up among
available threads.

Lawrence, I'm reasonably sure you're hitting KAFKA-3587 here, and should
upgrade to 0.10 ASAP. As far as I'm aware Kafka doesn't have any
backporting or stable versions policy, so the only ways to get that patch
are a) upgrade b) backport the patch yourself. b) seems extremely risky to
me

Thanks

Tom

On Tue, Jul 19, 2016 at 5:49 AM, Manikumar Reddy 
wrote:

> Try increasing log cleaner threads.
>
> On Tue, Jul 19, 2016 at 1:40 AM, Lawrence Weikum 
> wrote:
>
> > It seems that the log-cleaner is still failing no matter what settings I
> > give it.
> >
> > Here is the full output from one of our brokers:
> > [2016-07-18 13:00:40,726] ERROR [kafka-log-cleaner-thread-0], Error due
> > to  (kafka.log.LogCleaner)
> > java.lang.IllegalArgumentException: requirement failed: 192053210
> messages
> > in segment __consumer_offsets-15/.log but offset map
> > can fit only 7499. You can increase log.cleaner.dedupe.buffer.size or
> > decrease log.cleaner.threads
> > at scala.Predef$.require(Predef.scala:219)
> > at
> > kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584)
> > at
> > kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580)
> > at
> >
> scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570)
> > at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580)
> > at kafka.log.Cleaner.clean(LogCleaner.scala:322)
> > at
> > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230)
> > at
> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208)
> > at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> > [2016-07-18 13:00:40,732] INFO [kafka-log-cleaner-thread-0], Stopped
> > (kafka.log.LogCleaner)
> >
> > Currently, I have heap allocation up to 64GB, only one log-cleaning
> thread
> > is set to run, and log.cleaner.dedupe.buffer.size is 2GB.  I get this
> error
> > if I try to increase it anymore:
> >
> > WARN [kafka-log-cleaner-thread-0], Cannot use more than 2G of cleaner
> > buffer space per cleaner thread, ignoring excess buffer space...
> > (kafka.log.LogCleaner)
> >
> > Is there something else I can do to help the broker compact the
> > __consumer_offset topics?
> >
> > Thank you again for your help!
> >
> > Lawrence Weikum
> >
> > On 7/13/16, 1:06 PM, "Rakesh Vidyadharan" 
> > wrote:
> >
> > We ran into this as well, and I ended up with the following that works
> for
> > us.
> >
> > log.cleaner.dedupe.buffer.size=536870912
> > log.cleaner.io.buffer.size=2000
> >
> >
> >
> >
> >
> > On 13/07/2016 14:01, "Lawrence Weikum"  wrote:
> >
> > >Apologies. Here is the full trace from a broker:
> > >
> > >[2016-06-24 09:57:39,881] ERROR [kafka-log-cleaner-thread-0], Error due
> > to  (kafka.log.LogCleaner)
> > >java.lang.IllegalArgumentException: requirement failed: 9730197928
> > messages in segment __consumer_offsets-36/.log but
> > offset map can fit only 5033164. You can increase
> > log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads
> > >at scala.Predef$.require(Predef.scala:219)
> > >at
> > kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584)
> > >at
> > kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580)
> > >at
> >
> scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570)
> > >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580)
> > >at kafka.log.Cleaner.clean(LogCleaner.scala:322)
> > >at
> > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230)
> > >at
> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208)
> > >at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> > >[2016-06-24 09:57:39,881] INFO [kafka-log-cleaner-thread-0], Stopped
> > (kafka.log.LogCleaner)
> > >
> > >
> > >Is log.cleaner.dedupe.buffer.size a broker setting?  What is a good
> > number to set it to?
> > >
> > >
> > >
> > >Lawrence Weikum
> > >
> > >
> > >On 7/13/16, 11:18 AM, "Manikumar Reddy" 
> > wrote:
> > >
> > >Can you post the complete error stack trace?   Yes, you need to
> > >restart the affected
> > >brokers.
> > >You can tweak log.cleaner.dedupe.buffer.size, log.cleaner.io.buffer.size
> > >configs.
> > >
> > >Some related JIRAs:
> > >
> > >https://issues.apache.org/jira/browse/KAFKA-3587
> > >https://issues.apache.org/jira/browse/KAFKA-3894
> > >https://issues.apache.org/jira/browse/KAFKA-3915
> > >
> > >On Wed, Jul 13, 2016 at 10:36 PM, Lawrence Weikum 
> > >wrote:
> > >
> > >> Oh interesting. I didn’t know about 

Re: Monitor ISR value

2016-07-19 Thread Tom Crayford
Anderson,

The metric `UnderReplicatedPartitions` gives you the number of partitions
that are out of the ISR, but doesn't expose that per topic-partition. I'd
welcome the addition of that metric to Kafka (it shouldn't be that much
work in the source code I don't think), and I think others would as well.
For now though, you can get this by crawling zookeeper to figure out the
ISR.

Thanks

Tom

On Tue, Jul 19, 2016 at 11:46 AM, Anderson Goulart <
anderson.goul...@boxever.com> wrote:

> Hi,
>
> I am trying to monitor under replicated partitions to create an alert
> based on this metric. I want to get by topic,partition how many replicas
> are out-of-sync, compared to the replication factor, instead of a boolean
> value.
> So someone can get an alert like:
>
> Topic A, partition 0, has 2/3 out of sync replicas
>
> 3 = replication factor
>
>
> Is there anyway to get this info?
>
>
> thanks, anderson
>
>
>


Re: Kafka Fault Tolerance Test

2016-07-18 Thread Tom Crayford
Also which version of Kafka are you using?

On Mon, Jul 18, 2016 at 7:16 PM, Guozhang Wang  wrote:

> This is un-expected. Any error logs / exceptions did you see from the
> clients when they can no longer send / fetch from brokers?
>
> Guozhang
>
> On Mon, Jul 18, 2016 at 8:59 AM, Luo, Chao  wrote:
>
> > Dear Kafka fans,
> >
> > I have a concern of testing Kafka fault tolerance. Or may I did not
> > configure it right.
> >
> > I have two kafka servers and one zookeeper, which are running on three
> > different AWS EC2 instances. I created a topic with one partition and two
> > replica. First, the two kafka servers were running and everything was
> > perfect. After I shut down the leader for my topic, the producer and the
> > consumer just stopped working. Why did Kafka system stop working? I did
> not
> > see any fault tolerance here. However, when I restarted the leader (I
> just
> > shut down ), the producer and consumer started to work again. Here is my
> > topic description:
> >
> > Topic:fast-messages PartitionCount:1ReplicationFactor:2
> >  Configs:
> > Topic: fast-messagesPartition: 0Leader: 1   Replicas:
> > 1,0   Isr: 1,0
> >
> >
> >
> > Any comments or suggestions are highly appreciated!
> >
> > Thanks!
> > Chao
> >
>
>
>
> --
> -- Guozhang
>


Re: Using Kafka without persisting message to disk

2016-07-14 Thread Tom Crayford
Hi Jack,

No, kafka doesn't support not writing to disk. If you're really 100% sure
of yourself you could use a ramdisk and mount Kafka on it, but that's not
supported. I'd recommend "just" writing to disk, it's plenty fast enough
for nearly all use cases.

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jul 14, 2016 at 7:33 PM, Jack Huang <jackhu...@mz.com> wrote:

> Hi all,
>
> Is there a way to make a topic to be stored in memory only and not writing
> to disk? If not, what's the best way to minimize writing to disk? For this
> application we only need the notion of partitions and a short retention
> time (1hr or so) from Kafka. We want to use Kafka because we want to keep
> the flexibility to add persistence back if we need to.
>
> Thanks,
> Jack
>


Re: __consumer_offsets rebalance

2016-07-14 Thread Tom Crayford
Also note that there were a number of bugs fixed in the log cleaner thread
between 0.8 and the latest release. I wouldn't be comfortable relying on
kafka committed offsets on a version under 0.10 for a new production
system, and would carefully consider an upgrade all the way to the latest
release.

On Thu, Jul 14, 2016 at 1:37 PM, Todd Palino  wrote:

> It's safe to move the partitions of the offsets topic around. You'll move
> the consumer coordinators as you do, however, so the one thing you want to
> make sure of, especially running an older version, is that log compaction
> is working on your brokers and those partitions have been compacted. The
> coordinator needs to bootstrap the topic, and if log compaction is broken
> that can take a very long time. During that time, it will return errors to
> consumers for offset operations, and that can cause offset resets.
>
> -Todd
>
> On Thursday, July 14, 2016, Anderson Goulart  >
> wrote:
>
> > Hi,
> >
> > I am running kafka 0.8.2.1 under aws instances with multiple availability
> > zones. As we want a rack aware partition replication, we have our own
> > partition layout distribution, to make sure all partitions are well
> > balanced between nodes, leaders and availability zones.
> >
> > The problem arises with __consumer_offsets internal topic. In our current
> > environment it has 50 partitions and are all under the same AZ, with an
> > unbalanced leader (all wrong!)
> >
> > The question is: should I manually change its partition layout
> > distribution as I do for the other topics? Is it safe to reassign the new
> > layout for this internal topic, using kafka-reassign-partitions.sh?
> >
> >
> > Thanks, Anderson
> >
>
>
> --
> *Todd Palino*
> Staff Site Reliability Engineer
> Data Infrastructure Streaming
>
>
>
> linkedin.com/in/toddpalino
>


Re: Consumer Offsets and Open FDs

2016-07-13 Thread Tom Crayford
Hi,

You're running into the issue in
https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-3894 and
possibly
https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-3587
(which is fixed in 0.10). Sadly right now there's no way to know how high a
dedupe buffer size you need - it depends on the write throughput and number
of unique keys going to that topic. For now I'd recommend:

a) Upgrade to 0.10 as KAFKA-3587 is fixed there. Kafka doesn't backport
patches (as far as I'm aware), so you need to upgrade.
b) monitor and alert on the log cleaner thread dying. This can be done by
getting a thread dump from jmx, loading the thread names and ensuring one
with "log-cleaner" is always running. Alternatively monitoring the number
of log segments for compacted topics, or the number of file descriptors
will serve as an ok proxy. When this thread does crash, you have to
remediate by increasing the dedupe buffer size

We're exploring solutions in KAFKA-3894, and would love your feedback there
if you have any thoughts.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 13 July 2016, Rakesh Vidyadharan <rvidyadha...@gracenote.com>
wrote:

> We ran into this as well, and I ended up with the following that works for
> us.
>
> log.cleaner.dedupe.buffer.size=536870912
> log.cleaner.io.buffer.size=2000
>
>
>
>
>
> On 13/07/2016 14:01, "Lawrence Weikum" <lwei...@pandora.com <javascript:;>>
> wrote:
>
> >Apologies. Here is the full trace from a broker:
> >
> >[2016-06-24 09:57:39,881] ERROR [kafka-log-cleaner-thread-0], Error due
> to  (kafka.log.LogCleaner)
> >java.lang.IllegalArgumentException: requirement failed: 9730197928
> messages in segment __consumer_offsets-36/.log but
> offset map can fit only 5033164. You can increase
> log.cleaner.dedupe.buffer.size or decrease log.cleaner.threads
> >at scala.Predef$.require(Predef.scala:219)
> >at
> kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:584)
> >at
> kafka.log.Cleaner$$anonfun$buildOffsetMap$4.apply(LogCleaner.scala:580)
> >at
> scala.collection.immutable.Stream$StreamWithFilter.foreach(Stream.scala:570)
> >at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:580)
> >at kafka.log.Cleaner.clean(LogCleaner.scala:322)
> >at
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:230)
> >at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:208)
> >at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> >[2016-06-24 09:57:39,881] INFO [kafka-log-cleaner-thread-0], Stopped
> (kafka.log.LogCleaner)
> >
> >
> >Is log.cleaner.dedupe.buffer.size a broker setting?  What is a good
> number to set it to?
> >
> >
> >
> >Lawrence Weikum
> >
> >
> >On 7/13/16, 11:18 AM, "Manikumar Reddy" <manikumar.re...@gmail.com
> <javascript:;>> wrote:
> >
> >Can you post the complete error stack trace?   Yes, you need to
> >restart the affected
> >brokers.
> >You can tweak log.cleaner.dedupe.buffer.size, log.cleaner.io.buffer.size
> >configs.
> >
> >Some related JIRAs:
> >
> >https://issues.apache.org/jira/browse/KAFKA-3587
> >https://issues.apache.org/jira/browse/KAFKA-3894
> >https://issues.apache.org/jira/browse/KAFKA-3915
> >
> >On Wed, Jul 13, 2016 at 10:36 PM, Lawrence Weikum <lwei...@pandora.com
> <javascript:;>>
> >wrote:
> >
> >> Oh interesting. I didn’t know about that log file until now.
> >>
> >> The only error that has been populated among all brokers showing this
> >> behavior is:
> >>
> >> ERROR [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> >>
> >> Then we see many messages like this:
> >>
> >> INFO Compaction for partition [__consumer_offsets,30] is resumed
> >> (kafka.log.LogCleaner)
> >> INFO The cleaning for partition [__consumer_offsets,30] is aborted
> >> (kafka.log.LogCleaner)
> >>
> >> Using Visual VM, I do not see any log-cleaner threads in those
> brokers.  I
> >> do see it in the brokers not showing this behavior though.
> >>
> >> Any idea why the LogCleaner failed?
> >>
> >> As a temporary fix, should we restart the affected brokers?
> >>
> >> Thanks again!
> >>
> >>
> >> Lawrence Weikum
> >>
> >> On 7/13/16, 10:34 AM, "Manikumar Reddy" <manikumar.re...@gmail.com
> <javascript:;>> wrote:
> >>
> >> Hi,
> >>
&g

Re: NotAssignedReplicaException

2016-07-12 Thread Tom Crayford
Ah, that's around leader rebalancing. Do you have any scripts that run
kafka-assign-partitions or similar?

I will recheck but this doesn't sound like a thing that auto rebalance
would impact

On Tuesday, 12 July 2016, Gokul <gokulakanna...@gmail.com> wrote:

> Thanks. Auto rebalance is set to true, so rebalancing may be happening at
> that time. Is there any issue tracker that I can refer to?
> On 12 Jul 2016 21:48, "Tom Crayford" <tcrayf...@heroku.com
> <javascript:_e(%7B%7D,'cvml','tcrayf...@heroku.com');>> wrote:
>
>> Hi,
>>
>> Were you rebalancing that topic or partition at that time? If there are
>> rebalancing bugs this might point at that.
>>
>> Thanks
>>
>> Tom
>>
>> On Tue, Jul 12, 2016 at 6:47 AM, Gokul <slu...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','slu...@gmail.com');>> wrote:
>>
>> > We had an issue last week when kafka cluster reported under replicated
>> > partitions for quite a while but there were no brokers down. All the
>> > brokers were reporting unknownException on Broker 1. When checked
>> broker 1
>> > logs, it just reported below errors(NotAssignedReplicaException)
>> > continuously. Issue got resolved after bouncing broker 1. Think this
>> > exception comes when Controller issues StopReplicaRequest to broker 1
>> and
>> > it is in the process of leader election. But what is spooky is that this
>> > exception was reported more than 20 minutes by the broker and it was
>> > chocking entire ingestion(totally 10 brokers). Unfortunately I don't
>> have
>> > the controller logs to debug. Any pointers here? We are using 0.8.2.1
>> >
>> > ERROR [2016-07-07 11:45:09,248] [kafka-request-handler-1][]
>> > kafka.server.KafkaApis - [KafkaApi-1] error when handling request Name:
>> > FetchRequest; Version: 0; CorrelationId: 1890972; ClientId:
>> > ReplicaFetcherThread-2-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1
>> bytes;
>> > RequestInfo: 
>> > kafka.common.NotAssignedReplicaException: Leader 1 failed to record
>> > follower 2's position 20240372 since the replica is not recognized to be
>> > one of the assigned replicas 1,5,6 for partition [,1]
>> > at
>> >
>> >
>> kafka.server.ReplicaManager.updateReplicaLEOAndPartitionHW(ReplicaManager.scala:574)
>> > at
>> >
>> >
>> kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:388)
>> > at
>> >
>> >
>> kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:386)
>> > at
>> >
>> >
>> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>> > at
>> >
>> >
>> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>> > at
>> >
>> >
>> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
>> > at
>> > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
>> > at
>> >
>> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
>> > at
>> >
>> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
>> > at
>> >
>> >
>> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
>> > at
>> scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245)
>> > at
>> > kafka.server.KafkaApis.recordFollowerLogEndOffsets(KafkaApis.scala:386)
>> > at
>> kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:351)
>> > at kafka.server.KafkaApis.handle(KafkaApis.scala:60)
>> > at
>> > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
>> > at java.lang.Thread.run(Thread.java:745)
>> >
>> > --
>> > Thanks and Regards,
>> > Gokul
>> >
>>
>


Re: NotAssignedReplicaException

2016-07-12 Thread Tom Crayford
Hi,

Were you rebalancing that topic or partition at that time? If there are
rebalancing bugs this might point at that.

Thanks

Tom

On Tue, Jul 12, 2016 at 6:47 AM, Gokul  wrote:

> We had an issue last week when kafka cluster reported under replicated
> partitions for quite a while but there were no brokers down. All the
> brokers were reporting unknownException on Broker 1. When checked broker 1
> logs, it just reported below errors(NotAssignedReplicaException)
> continuously. Issue got resolved after bouncing broker 1. Think this
> exception comes when Controller issues StopReplicaRequest to broker 1 and
> it is in the process of leader election. But what is spooky is that this
> exception was reported more than 20 minutes by the broker and it was
> chocking entire ingestion(totally 10 brokers). Unfortunately I don't have
> the controller logs to debug. Any pointers here? We are using 0.8.2.1
>
> ERROR [2016-07-07 11:45:09,248] [kafka-request-handler-1][]
> kafka.server.KafkaApis - [KafkaApi-1] error when handling request Name:
> FetchRequest; Version: 0; CorrelationId: 1890972; ClientId:
> ReplicaFetcherThread-2-1; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo: 
> kafka.common.NotAssignedReplicaException: Leader 1 failed to record
> follower 2's position 20240372 since the replica is not recognized to be
> one of the assigned replicas 1,5,6 for partition [,1]
> at
>
> kafka.server.ReplicaManager.updateReplicaLEOAndPartitionHW(ReplicaManager.scala:574)
> at
>
> kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:388)
> at
>
> kafka.server.KafkaApis$$anonfun$recordFollowerLogEndOffsets$2.apply(KafkaApis.scala:386)
> at
>
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
>
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
> at
>
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:778)
> at
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
> at
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
> at
>
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:777)
> at scala.collection.MapLike$MappedValues.foreach(MapLike.scala:245)
> at
> kafka.server.KafkaApis.recordFollowerLogEndOffsets(KafkaApis.scala:386)
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:351)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:60)
> at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
> at java.lang.Thread.run(Thread.java:745)
>
> --
> Thanks and Regards,
> Gokul
>


Re: Rolling upgrade from 0.8.2.1 to 0.9.0.1 failing with replicafetchthread OOM errors

2016-07-11 Thread Tom Crayford
Hi (I'm the author of that ticket):

>From my understanding limiting MaxDirectMemory won't workaround this memory
leak. The leak is inside the JVM's implementation, not in normal direct
buffers. On one of our brokers with this issue, we're seeing the JVM report
1.2GB of heap, and 128MB of offheap memory, yet the actual process memory
is more like 10GB.

Thanks

Tom Crayford
Heroku Kafka


Re: rate-limiting on rebalancing, or sync from non-leaders?

2016-07-03 Thread Tom Crayford
Hi Charity,

I'm not sure about the roadmap. The way we (and linkedin/dropbox/netflix)
handle rebalances right now is to do a small handful of partitions at a
time (LinkedIn does 10 partitions at a time the last I heard), not a big
bang rebalance of all the partitions in the cluster. That's not perfect and
not great throttling, and I agree that it's something Kafka desperately
needs to work on.

Thanks

Tom Crayford
Heroku Kafka

On Sun, Jul 3, 2016 at 2:00 AM, Charity Majors <char...@hound.sh> wrote:

> Hi there,
>
> I'm curious if there's anything on the Kafka roadmap for adding
> rate-limiting or max-throughput for rebalancing processes.
>
> Alternately, if you have RF>2, maybe a setting to instruct followers to
> sync from other followers?
>
> I'm super impressed with how fast and efficient the kafka data rebalancing
> process is, but also fear for the future when it's battling for resources
> against high production trafffic.  :)
>


Re: what is use of __consumer_offsets

2016-06-30 Thread Tom Crayford
No. This is used for tracking consumer offsets. Kafka manages cleaning it
up itself.

On Thu, Jun 30, 2016 at 1:52 PM, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
> But does it create folder for every message we put in kafka, for every
> offset?
>
> And do we need to clean those folders? is there any configuration?
>
> - Original Message -
> From: "Tom Crayford" <tcrayf...@heroku.com>
> To: "Users" <users@kafka.apache.org>
> Sent: Thursday, June 30, 2016 6:11:03 PM
> Subject: Re: what is use of __consumer_offsets
>
> Hi there,
>
> Kafka uses this topic internally for consumer offset commits.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Thu, Jun 30, 2016 at 1:36 PM, Snehalata Nagaje <
> snehalata.nag...@harbingergroup.com> wrote:
>
> >
> > Hi All,
> >
> >
> > I am using kafka 9 version with publish subscribe pattern, one consumer
> is
> > listening to particular topic
> >
> > What is use __consumer_offsets, folders created in log files?
> >
> > Does it have any impact on offset commiting?
> >
> >
> >
> > Thanks,
> > Snehalata
> >
>


Re: what is use of __consumer_offsets

2016-06-30 Thread Tom Crayford
Hi there,

Kafka uses this topic internally for consumer offset commits.

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jun 30, 2016 at 1:36 PM, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
> Hi All,
>
>
> I am using kafka 9 version with publish subscribe pattern, one consumer is
> listening to particular topic
>
> What is use __consumer_offsets, folders created in log files?
>
> Does it have any impact on offset commiting?
>
>
>
> Thanks,
> Snehalata
>


Re: Log retention just for offset topic

2016-06-30 Thread Tom Crayford
The default cleanup policy is delete, which is the regular time based
retention.

On Thursday, 30 June 2016, Sathyakumar Seshachalam <
sathyakumar_seshacha...@trimble.com> wrote:

> Or may be am wrong, and Log cleaner only picks up topics with a
> cleanup.policy.
> From the documentation it is not very obvious what the behaviour is.
>
> On Thu, Jun 30, 2016 at 10:33 AM, Sathyakumar Seshachalam <
> sathyakumar_seshacha...@trimble.com > wrote:
>
> > Hi,
> >
> > Thanks for the response.
> >
> > I still like to know what happens for topics which have not defined a
> > cleanup.policy.
> > I assume the default value is compact. And hence all topic's logs will be
> > compacted which I want to avoid.
> >
> > Am running 0.9.0, So will have to manually set log.cleaner.enable=true
> >
> > Regards,
> > Sathya
> >
> > On Thu, Jun 30, 2016 at 10:20 AM, Manikumar Reddy <
> > manikumar.re...@gmail.com > wrote:
> >
> >> Hi,
> >>
> >> Kafka internally creates the offsets topic (__consumer_offsets) with
> >> compact mode on.
> >> From 0.9.0.1 onwards log.cleaner.enable=true by default.  This means
> >> topics
> >> with a
> >> cleanup.policy=compact will now be compacted by default,
> >>
> >> You can tweak the offset topic configuration by using below  props
> >> offsets.topic.compression.codec
> >> offsets.topic.num.partitions
> >> offsets.topic.replication.factor
> >> offsets.topic.segment.bytes
> >> offsets.retention.minutes
> >> offsets.retention.check.interval.ms
> >>
> >>
> >> Thanks
> >> Manikumar
> >>
> >> On Thu, Jun 30, 2016 at 9:49 AM, Sathyakumar Seshachalam <
> >> sathyakumar_seshacha...@trimble.com > wrote:
> >>
> >> > Am little confused about how log cleaner works. My use case is that I
> >> want
> >> > to compact just selected topics (or in my case just the internal topic
> >> > __consumers_offsets and want to leave other topics as is).
> >> >
> >> > Whats the right settings/configuration for this to happen.
> >> >
> >> > As I understand log cleaner enable/disable is a global setting. And my
> >> > understanding is that they will clean all logs (compact logs based on
> >> > cleanup policy), and so all topics' clean up policy will be considered
> >> and
> >> > hence compacted - compact being the default policy. Is this correct ?
> >> >
> >> > I have set all topic's retention duration to be a really exorbitantly
> >> high
> >> > value. Does it mean __consumer_offsets wont be compacted at all ? If
> so,
> >> > how to set retention time just for offset topic it being an internal
> >> topic.
> >> >
> >> > Regards,
> >> > Sathya
> >> >
> >>
> >
> >
>


Re: AWS EFS

2016-06-29 Thread Tom Crayford
I think you'll be far better off using EBS and Kafka's inbuilt distribution
than NFS mounts. Kafka's designed for distributing data natively, and not
for NFS style mounts.

On Wed, Jun 29, 2016 at 11:46 AM, Ben Davison 
wrote:

> Does anyone have any opinions on this?
>
> https://aws.amazon.com/blogs/aws/amazon-elastic-file-system-production-ready-in-three-regions/
>
> Looks interesting, just wondering if anyone else uses NFS mounts with
> Kafka?
>
> Thanks,
>
> Ben
>
> --
>
>
> This email, including attachments, is private and confidential. If you have
> received this email in error please notify the sender and delete it from
> your system. Emails are not secure and may contain viruses. No liability
> can be accepted for viruses that might be transferred by this email or any
> attachment. Any unauthorised copying of this message or unauthorised
> distribution and publication of the information contained herein are
> prohibited.
>
> 7digital Limited. Registered office: 69 Wilson Street, London EC2A 2BB.
> Registered in England and Wales. Registered No. 04843573.
>


Re: Expired messages in kafka topic

2016-06-23 Thread Tom Crayford
No, there's no control over that. The right way to do this is to keep up
with the head of the topic and decide on "old" yourself in the consumer.

Deletion can happen at different times on the different replicas of the
log, and to different messages. Whilst a consumer will only be reading from
the lead broker for any log at any one time, the leader can and will change
to handle broker failure.

On Thu, Jun 23, 2016 at 4:37 PM, Krish <krishnan.k.i...@gmail.com> wrote:

> Thanks Tom.
> Is there any way a consumer can be triggered when the message is about to
> be deleted by Kafka?
>
>
>
> --
> κρισhναν
>
> On Thu, Jun 23, 2016 at 6:16 PM, Tom Crayford <tcrayf...@heroku.com>
> wrote:
>
>> Hi,
>>
>> A pretty reasonable thing to do here would be to have a consumer that
>> moved "old" events to another topic.
>>
>> Kafka has no concept of an expired queue, the only thing it can do once a
>> message is aged out is delete it. The deletion is done in bulk and
>> typically is set to 24h or even higher (LinkedIn use 4 days, the default is
>> 7 days).
>>
>> Thanks
>>
>> Tom Crayford
>> Heroku Kafka
>>
>> On Thu, Jun 23, 2016 at 10:45 AM, Krish <krishnan.k.i...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I am trying to design a real-time application where message timeout can
>>> be
>>> as low as a minute or two (message can get stale real-fast).
>>>
>>> In the rare chance that the consumers lag too far behind in processing
>>> messages from the broker, is there a concept of expired message queue in
>>> Kafka?
>>>
>>> I would like to know if a message has expired and then park it in some
>>> topic till as such time that a service can dequeue, process it and/or
>>> investigate it.
>>>
>>> Thanks.
>>>
>>> Best,
>>> Krish
>>>
>>
>>
>


Re: Expired messages in kafka topic

2016-06-23 Thread Tom Crayford
Hi,

A pretty reasonable thing to do here would be to have a consumer that moved
"old" events to another topic.

Kafka has no concept of an expired queue, the only thing it can do once a
message is aged out is delete it. The deletion is done in bulk and
typically is set to 24h or even higher (LinkedIn use 4 days, the default is
7 days).

Thanks

Tom Crayford
Heroku Kafka

On Thu, Jun 23, 2016 at 10:45 AM, Krish <krishnan.k.i...@gmail.com> wrote:

> Hi,
> I am trying to design a real-time application where message timeout can be
> as low as a minute or two (message can get stale real-fast).
>
> In the rare chance that the consumers lag too far behind in processing
> messages from the broker, is there a concept of expired message queue in
> Kafka?
>
> I would like to know if a message has expired and then park it in some
> topic till as such time that a service can dequeue, process it and/or
> investigate it.
>
> Thanks.
>
> Best,
> Krish
>


Re: Log Compaction not Deleting on Kafka 0.9.0.1

2016-06-22 Thread Tom Crayford
This smells like an bug to me.

On Wed, Jun 22, 2016 at 6:54 PM, Lawrence Weikum <lwei...@pandora.com>
wrote:

> Fascinating.
>
> We are seeing no errors or warning in the logs after restart.  It appears
> on this broker that the compaction thread is working:
>
> [2016-06-22 10:33:49,179] INFO Rolled new log segment for
> '__consumer_offsets-28' in 1 ms. (kafka.log.Log)
> [2016-06-22 10:34:00,968] INFO Deleting segment 0 from log
> __consumer_offsets-28. (kafka.log.Log)
> [2016-06-22 10:34:00,970] INFO Deleting index
> /kafka/data/__consumer_offsets-28/.index.deleted
> (kafka.log.OffsetIndex)
> [2016-06-22 10:34:00,992] INFO Deleting segment 2148144095 from log
> __consumer_offsets-28. (kafka.log.Log)
> [2016-06-22 10:34:00,994] INFO Deleting index
> /kafka/data/__consumer_offsets-28/002148144095.index.deleted
> (kafka.log.OffsetIndex)
> [2016-06-22 10:34:01,002] INFO Deleting segment 3189277822 from log
> __consumer_offsets-28. (kafka.log.Log)
> [2016-06-22 10:34:01,004] INFO Deleting index
> /kafka/data/__consumer_offsets-28/003189277822.index.deleted
> (kafka.log.OffsetIndex)
> [2016-06-22 10:34:02,019] INFO Deleting segment 3190205744 from log
> __consumer_offsets-28. (kafka.log.Log)
> [2016-06-22 10:34:02,039] INFO Deleting index
> /kafka/data/__consumer_offsets-28/003190205744.index.deleted
> (kafka.log.OffsetIndex)
>
> We see the “kafka-log-cleaner-thread” in the JMX.  It seems to run about
> every 50 seconds.  From the log-cleaner.log file, we see plenty of this
> output regarding the partition that’s hogging the FDs:
>
> [2016-06-22 10:44:31,845] INFO Cleaner 0: Beginning cleaning of log
> __consumer_offsets-28. (kafka.log.LogCleaner)
> [2016-06-22 10:44:31,846] INFO Cleaner 0: Building offset map for
> __consumer_offsets-28... (kafka.log.LogCleaner)
> [2016-06-22 10:44:31,878] INFO Cleaner 0: Building offset map for log
> __consumer_offsets-28 for 1 segments in offset range [3204124461,
> 3205052375). (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,870] INFO Cleaner 0: Offset map for log
> __consumer_offsets-28 complete. (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,871] INFO Cleaner 0: Cleaning log
> __consumer_offsets-28 (discarding tombstones prior to Tue Jun 21 10:43:19
> PDT 2016)... (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,871] INFO Cleaner 0: Cleaning segment 0 in log
> __consumer_offsets-28 (last modified Tue Jun 21 22:39:18 PDT 2016) into 0,
> retaining deletes. (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,888] INFO Cleaner 0: Swapping in cleaned segment 0
> for segment(s) 0 in log __consumer_offsets-28. (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,889] INFO Cleaner 0: Cleaning segment 2148144095 in
> log __consumer_offsets-28 (last modified Wed Jun 22 10:42:31 PDT 2016) into
> 2148144095, retaining deletes. (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,889] INFO Cleaner 0: Cleaning segment 3203196540 in
> log __consumer_offsets-28 (last modified Wed Jun 22 10:43:19 PDT 2016) into
> 2148144095, retaining deletes. (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,905] INFO Cleaner 0: Swapping in cleaned segment
> 2148144095 for segment(s) 2148144095,3203196540 in log
> __consumer_offsets-28. (kafka.log.LogCleaner)
> [2016-06-22 10:44:32,905] INFO Cleaner 0: Cleaning segment 3204124461 in
> log __consumer_offsets-28 (last modified Wed Jun 22 10:44:21 PDT 2016) into
> 3204124461, retaining deletes. (kafka.log.LogCleaner)
> [2016-06-22 10:44:33,834] INFO Cleaner 0: Swapping in cleaned segment
> 3204124461 for segment(s) 3204124461 in log __consumer_offsets-28.
> (kafka.log.LogCleaner)
> [2016-06-22 10:44:33,836] INFO [kafka-log-cleaner-thread-0],
> Log cleaner thread 0 cleaned log __consumer_offsets-28 (dirty
> section = [3204124461, 3205052375])
> 100.0 MB of log processed in 2.0 seconds (50.3 MB/sec).
> Indexed 100.0 MB in 1.0 seconds (97.6 Mb/sec, 51.5% of total time)
> Buffer utilization: 0.0%
> Cleaned 100.0 MB in 1.0 seconds (103.6 Mb/sec, 48.5% of total time)
> Start size: 100.0 MB (928,011 messages)
> End size: 0.0 MB (97 messages)
>     100.0% size reduction (100.0% fewer messages)
>  (kafka.log.LogCleaner)
>
> But no actual delete messages like a properly-working broker is showing of
> a different partition.
>
> Lawrence Weikum
>
>
> On 6/22/16, 11:28 AM, "Tom Crayford" <tcrayf...@heroku.com> wrote:
>
> kafka-log-cleaner-thread-0
>
>


Re: Log Compaction not Deleting on Kafka 0.9.0.1

2016-06-22 Thread Tom Crayford
Is the log cleaner thread running? We've seen issues where the log cleaner
thread dies after too much logged data. You'll see a message like this:

[kafka-log-cleaner-thread-0], Error due to
java.lang.IllegalArgumentException: requirement failed: 9750860 messages in
segment MY_FAVORITE_TOPIC_IS_SORBET-2/47580165.log but offset
map can fit only 5033164. You can increase log.cleaner.dedupe.buffer.size
or decrease log.cleaner.threads

You can check if it's running by dumping threads using JMX and looking for
the thread name containing `kafka-log-cleaner-thread`

If this happens, there's not too much remediation you *can* do right now.
One potential is (assuming significant replication and enough other cluster
stability) is to delete the data on the broker and bring it up again, and
ensure the log cleaner is turned on the whole time. *hopefully* compaction
will keep up whilst kafka catches up with replication, but that's not
guaranteed.

We're going to be upstreaming a ticket shortly based on this and other
issues we've seen with log compaction.

On Wed, Jun 22, 2016 at 6:03 PM, Lawrence Weikum 
wrote:

> We seem to be having a strange issue with a cluster of ours; specifically
> with the __consumer_offsets topic.
>
> When we brought the cluster online, log compaction was turned off.
> Realizing our mistake, we turned it on, but only after the topic had over
> 31,018,699,972 offsets committed to it.  Log compaction seems to have
> worked and be working properly.  The logs are showing that every partition
> has been compacted, and may pieces have been marked for deletion.
>
> The problem is that not all partitions are having their older logs
> deleted.  Some partitions will grow to having 19 log files, but a few
> seconds later will have only 13.  One partition in particular, though,
> still has all of its log files, all 19,000 of them, and this never seems to
> change, only grow as new offsets come in.
>
> Restarting that broker doesn’t seem to help.
>
>
> We’ve checked the broker settings on everything as well.
>
> log.cleaner.enable = true
> log.cleanup.policy = delete
> cleanup.policy = compact
>
>
> Has anyone encountered this issue before?
>
> Thank you all for the help!
>
> Lawrence Weikum
>
>


Re: Message loss with kafka 0.8.2.2

2016-06-17 Thread Tom Crayford
Did you check if the controller is active in the cluster? If the controller
isn't active (there are known 0.8 bugs that can lead to this), then this
could cause this kind of data loss issue. I recommend upgrading to 0.9 ASAP.

Thanks

Tom Crayford
Heroku Kafka

On Friday, 17 June 2016, Gulia, Vikram <vikram.gu...@dish.com> wrote:

> Hi Gerard, thanks for the reply. Few follow ups -
>
> 1. I can try setting acks = all but wouldn't it lead to performance hit (I
> am using sync produce thus response time will be more).
> 2. I will try unclean.leader.election.enable = false and update you if it
> helps.
> 3. Regarding your last point, I am confused. What I understood about kafka
> is that the producer client always retrieve the topic metadata and already
> knows who the leader for the topic is. And the producer client always
> sends the message to the leader only (the replicas replicate the message
> and send acknowledgements to the leader). Are you saying the producer
> client can send message to any broker who is not leader or to two or more
> brokers one of which may or may not be leader?
>
> Thank you,
> Vikram Gulia
>
>
>
>
> On 6/17/16, 12:29 AM, "Gerard Klijs" <gerard.kl...@dizzit.com
> <javascript:;>> wrote:
>
> >You could try set the acks to -1, so you wait for the produce to be
> >succesfull, until most other brokers also received the message. Another
> >thing you could try is set the unclean.leader.election.enable to false
> >(this is a setting on the broker).
> >I think what's happening now is that the message in your example is send
> >to
> >two different brokers, because one of them is not sending the record to
> >the
> >actual leader. Since you have set your acks to one, you wont see any error
> >in the producer, cause it succeeded in sending it to the broker. You most
> >likely will see some error on the broker, because it is not the leader.
> >
> >On Fri, Jun 17, 2016 at 5:19 AM Gulia, Vikram <vikram.gu...@dish.com
> <javascript:;>>
> >wrote:
> >
> >> Hi Users, I am facing message loss while using kafka v 0.8.2.2. Please
> >>see
> >> details below and help me if you can.
> >>
> >> Issue: 2 messages produced to same partition one by one ­ Kafka producer
> >> returns same offset back which means message produced earlier is lost.<
> >>
> >>
> http://stackoverflow.com/questions/37732088/2-messages-produced-to-same-p
> >>artition-one-by-one-message-1-overridden-by-next
> >> >
> >>
> >> Details:
> >> I have a unique problem which is happening like 50-100 times a day with
> >> message volume of more than 2 millions per day on the topic.I am using
> >> Kafka producer API 0.8.2.2 and I have 12 brokers (v 0.8.2.2) running in
> >> prod with replication of 4. I have a topic with 60 partitions and I am
> >> calculating partition for all my messages and providing the value in the
> >> ProducerRecord itself. Now, the issue -
> >>
> >> Application creates 'ProducerRecord' using -
> >>
> >> new ProducerRecord<String, String>(topic, 30, null, message1);
> >> providing topic, value message1 and partition 30. Then application call
> >> the send method and future is returned -
> >>
> >> // null is for callback
> >> Future future = producer.send(producerRecord. null);
> >> Now, app prints the offset and partition value by calling get on Future
> >> and then getting values from RecordMetadata - this is what i get -
> >>
> >> Kafka Response : partition 30, offset 3416092
> >> Now, the app produce the next message - message2 to same partition -
> >>
> >> new ProducerRecord<String, String>(topic, 30, null, message2);
> >> and kafka response -
> >>
> >> Kafka Response : partition 30, offset 3416092
> >> I received the same offset again, and if I pull message from the offset
> >>of
> >> partition 30 using simple consumer, it ends up being the message2 which
> >> essentially mean i lost the message1.
> >>
> >> Currently, the messages are produced using 10 threads each having its
> >>own
> >> instance of kafka producer (Earlier threads shared 1 Kafka producer but
> >>it
> >> was performing slow and we still had message loss).
> >> I am using all default properties for producer except a few mentioned
> >> below, the message (String payload) size can be a few kbs to a 500 kbs.
> >>I
> >> am using acks value of 1.
> >>
> >> value.

Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-17 Thread Tom Crayford
+1

We can't promise security features whilst using a deprecated version of the
JDK and relying on it for underlying security functionality (e.g. SSL).
This impacts both clients and brokers. Java 7 has been deprecated for over
a year, and software that isn't up to date with that is at fault with
respect to comparability.

On Friday, 17 June 2016, Ofir Manor  wrote:

>  I also think that Kafka should drop java 7 and scala 2.10 already.
> However, I expect Kafka (or any other project) to do it in two steps:
> 1. announce NOW that both of them are deprecated in the 0.10 series and
> will be dropped in the next major release.
> 2. drop them a the next major release.
> (ideally, the announcement should have been part of the 0.10.0 release
> notes, but any early warning is better than none)
>
> Regarding clients - if clients are 0.10 and broker is 0.11, which new 0.11
> functionality is lost? Does THAT worth a deadline extension for clients
> only? Can support for some missing functionality be backported to 0.10.x
> clients if deemed critical? (don't have an answer, but this is the right
> question IMHO)
>
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io 
>
> On Thu, Jun 16, 2016 at 11:45 PM, Ismael Juma  > wrote:
>
> > Hi all,
> >
> > I would like to start a discussion on making Java 8 a minimum requirement
> > for Kafka's next feature release (let's say Kafka 0.10.1.0 for now). This
> > is the first discussion on the topic so the idea is to understand how
> > people feel about it. If people feel it's too soon, then we can pick up
> the
> > conversation again after Kafka 0.10.1.0. If the feedback is mostly
> > positive, I will start a vote thread.
> >
> > Let's start with some dates. Java 7 hasn't received public updates since
> > April 2015[1], Java 8 was released in March 2014[2] and Java 9 is
> scheduled
> > to be released in March 2017[3].
> >
> > The first argument for dropping support for Java 7 is that the last
> public
> > release by Oracle contains a large number of known security
> > vulnerabilities. The effectiveness of Kafka's security features is
> reduced
> > if the underlying runtime is not itself secure.
> >
> > The second argument for moving to Java 8 is that it adds a number of
> > compelling features:
> >
> > * Lambda expressions and method references (particularly useful for the
> > Kafka Streams DSL)
> > * Default methods (very useful for maintaining compatibility when adding
> > methods to interfaces)
> > * java.util.stream (helpful for making collection transformations more
> > concise)
> > * Lots of improvements to java.util.concurrent (CompletableFuture,
> > DoubleAdder, DoubleAccumulator, StampedLock, LongAdder, LongAccumulator)
> > * Other nice things: SplittableRandom, Optional (and many others I have
> not
> > mentioned)
> >
> > The third argument is that it will simplify our testing matrix, we won't
> > have to test with Java 7 any longer (this is particularly useful for
> system
> > tests that take hours to run). It will also make it easier to support
> Scala
> > 2.12, which requires Java 8.
> >
> > The fourth argument is that many other open-source projects have taken
> the
> > leap already. Examples are Cassandra[4], Lucene[5], Akka[6], Hadoop 3[7],
> > Jetty[8], Eclipse[9], IntelliJ[10] and many others[11]. Even Android will
> > support Java 8 in the next version (although it will take a while before
> > most phones will use that version sadly). This reduces (but does not
> > eliminate) the chance that we would be the first project that would
> cause a
> > user to consider a Java upgrade.
> >
> > The main argument for not making the change is that a reasonable number
> of
> > users may still be using Java 7 by the time Kafka 0.10.1.0 is released.
> > More specifically, we care about the subset who would be able to upgrade
> to
> > Kafka 0.10.1.0, but would not be able to upgrade the Java version. It
> would
> > be great if we could quantify this in some way.
> >
> > What do you think?
> >
> > Ismael
> >
> > [1] https://java.com/en/download/faq/java_7.xml
> > [2] https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> > [3] http://openjdk.java.net/projects/jdk9/
> > [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> > [5] https://lucene.apache.org/#highlights-of-this-lucene-release-include
> > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> > [7] https://issues.apache.org/jira/browse/HADOOP-11858
> > [8] https://webtide.com/jetty-9-3-features/
> > [9] http://markmail.org/message/l7s276y3xkga2eqf
> > [10]
> >
> >
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> > [11] http://markmail.org/message/l7s276y3xkga2eqf
> >
>


Re: Delete Message From topic

2016-06-14 Thread Tom Crayford
Hi Mudit,

Sorry this is not possible. The only deletion Kafka offers is retention or
whole topic deletion.

Thanks

Tom Crayford
Heroku Kafka

On Tuesday, 14 June 2016, Mudit Kumar <mudit.ku...@askme.in> wrote:

> Hey,
>
> How can I delete particular messages from particular topic?Is that
> possible?
>
> Thanks,
> Mudit
>
>


Re: ELB for Kafka

2016-06-10 Thread Tom Crayford
Kafka itself handles distribution among brokers and which broker consumers
and producers connect to. There's no need for an ELB, and you have to
directly expose all brokers to producers and consumers.

On Friday, 10 June 2016, Ram Chander  wrote:

> Hi,
>
>
> I am trying to setup Kafka cluster in AWS.
> Is it possible to have Kafka brokers behind ELB and producers and consumers
> talk only to ELB ?
> If not, should we directly expose all  brokers to all producers/consumers ?
> Please advise.
>
>
> Regards,
> Ram
>


Re: JVM Optimizations

2016-06-10 Thread Tom Crayford
Barry,

No, because Kafka also relies heavily on the OS page cache, which uses
memory. You'd roughly want to allocate enough page cache to hold all the
messages for your consumers for say, 30s.

Kafka also (in our experience on EC2) tends to run out of network far
before it runs out of memory or disk bandwidth, so colocating brokers makes
that much more likely.

Thanks

Tom Crayford, Heroku Kafka

On Fri, Jun 10, 2016 at 7:02 AM, Barry Kaplan <bkap...@memelet.com> wrote:

> If too much heap cause problems, would it make sense to run multiple
> brokers on a box with lots memory? For example, an EC2 D2 instance types
> has way way more ram than kafka could ever use - -but it has fast connected
> disks.
>
> Would running a broker per disk make sense in this case?
>
> -barry
>


Re: JVM Optimizations

2016-06-09 Thread Tom Crayford
Hi Lawrence,

What JVM options were you using? There's a few pages in the confluent docs
on JVM tuning iirc. We simply use the G1 and a 4GB Max heap and things work
well (running many thousands of clusters).

Thanks
Tom Crayford
Heroku Kafka

On Thursday, 9 June 2016, Lawrence Weikum <lwei...@pandora.com> wrote:

> Hello all,
>
> We’ve been running a benchmark test on a Kafka cluster of ours running
> 0.9.0.1 – slamming it with messages to see when/if things might break.
> During our test, we caused two brokers to throw OutOfMemory errors (looks
> like from the Heap) even though each machine still has 43% of the total
> memory unused.
>
> I’m curious what JVM optimizations are recommended for Kafka brokers?  Or
> if there aren’t any that are recommended, what are some optimizations
> others are using to keep the brokers running smoothly?
>
> Best,
>
> Lawrence Weikum
>
>


Re: Migrating from 07.1 .100

2016-06-08 Thread Tom Crayford
No. These versions and all versions 0.8 onwards rely on Zookeeper.

On Wednesday, 8 June 2016, Subhash Agrawal  wrote:

> Hi,
> I am currently using Kafka 0.7.1 without zookeeper. We have single node
> kafka server.
> To enhance security, we have decided to support SSL. As 0.7.1 version does
> not support SSL,
> we are upgrading to latest version 0.10.0.0. We noticed that with the
> latest version, it is
> mandatory to use zookeeper.
>
> Is there any way I can use Kafka 0.10 or 0.9 version without zookeeper?
>
> Thanks
> Subhash A.
>
>


Re: Kafka behind a load balancer

2016-06-03 Thread Tom Crayford
Hi,

Kafka is designed to distribute traffic between brokers itself. It's
naturally distributed and does not need, and indeed will not work behind a
load balancer. I'd recommend reading the docs for more, but
http://kafka.apache.org/documentation.html#design_loadbalancing is a good
start.

Thanks

Tom Crayford
Heroku Kafka

On Fri, Jun 3, 2016 at 1:15 PM, cs user <acldstk...@gmail.com> wrote:

> Hi All,
>
> Does anyone have any experience of using kafka behind a load balancer?
>
> Would this work? Are there any reasons why you would not want to do it?
>
> Thanks!
>


Re: Kafka broker slow down when consumer try to fetch large messages from topic

2016-06-02 Thread Tom Crayford
Hi there,

Firstly, a note that Kafka isn't really designed for this kind of large
message. http://ingest.tips/2015/01/21/handling-large-messages-kafka/
covers a lot of tips around this use case however, and covers some tuning
that will likely improve your usage.

In particular, I expect tuning up fetch.message.max.bytes on the consumer
to help out a lot here.

Generally though, doing large messages will lead to very low throughput and
lots of stability issues, as noted in that article. We run thousands of
clusters in production, and typically recommend folk keep message sizes
down to the few tens of KB for most use cases.

Thanks

Tom Crayford
Heroku Kafka

On Wed, Jun 1, 2016 at 9:49 PM, prateek arora <prateek.arora...@gmail.com>
wrote:

> I have 4 node kafka broker with following configuration :
>
> Default Number of Partitions  : num.partitions : 1
> Default Replication Factor : default.replication.factor : 1
> Maximum Message Size : message.max.bytes : 10 MB
> Replica Maximum Fetch Size : replica.fetch.max.bytes : 10 MB
>
>
> Right now I have 4 topic with 1 partition and 1 replication factor .
>
> "Topic Name" : "Broker Id" :  "Total Messages Received Across Kafka
> Broker" : "Total Bytes Received Across Kafka Broker"
> Topic 1  - Leader Kafka Broker 1 :  4.67 Message/Second  :  1.6 MB/second
> Topic 2  - Leader Kafka Broker 2 :  4.78 Message/Second  :  4.1 MB/second
> Topic 3  - Leader Kafka Broker 1 :  4.83  Message/Second   : 1.6 MB/second
> Topic 4  - Leader Kafka Broker 3  : 4.8 Message/Second   :   4.3 MB/second
>
> Message consist of .
>
>
> when consumer tried to read message from "Topic 2"  Kafka Broker rate of
>  message receiving slow down from 4.77 message/second to 3.12
> message/second  , after some time  try to goes up .
>
> I also attached screenshot of "Total Messages Received Across Kafka
> Broker"  and "Total Bytes Received Across Kafka Broker" for topic "Topic
> 2" .
>
> can someone explain why it is happen and how to solve it ?
>
> Regards
> Prateek
>
>
>
>
>
>
>


Re: broker randomly shuts down

2016-06-02 Thread Tom Crayford
That looks like somebody is killing the process. I'd suspect either the
linux OOM killer or something else automatically killing the JVM for some
reason.

For the OOM killer, assuming you're on ubuntu, it's pretty easy to find in
/var/log/syslog (depending on your setup). I don't know about other
operating systems.

On Thu, Jun 2, 2016 at 5:54 AM, allen chan 
wrote:

> I have an issue where my brokers would randomly shut itself down.
> I turned on debug in log4j.properties but still do not see a reason why the
> shutdown is happening.
>
> Anyone seen this behavior before?
>
> version 0.10.0
> log4j.properties
> log4j.rootLogger=DEBUG, kafkaAppender
> * I tried TRACE level but i do not see any additional log messages
>
> snippet of log around shutdown
> [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.GroupMetadataManager)
> [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]:
> Removed 0 expired offsets in 0 milliseconds.
> (kafka.coordinator.GroupMetadataManager)
> [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid:
> 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn)
> [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> (kafka.server.KafkaServer)
> [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down
> (kafka.server.KafkaServer)
> [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled
> shutdown (kafka.server.KafkaServer)
> [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled
> shutdown (kafka.server.KafkaServer)
> [2016-06-01 15:12:13,338] DEBUG Added sensor with name connections-closed:
> (org.apache.kafka.common.metrics.Metrics)
> [2016-06-01 15:12:13,338] DEBUG Added sensor with name connections-created:
> (org.apache.kafka.common.metrics.Metrics)
> [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent-received:
> (org.apache.kafka.common.metrics.Metrics)
> [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent:
> (org.apache.kafka.common.metrics.Metrics)
> [2016-06-01 15:12:13,339] DEBUG Added sensor with name bytes-received:
> (org.apache.kafka.common.metrics.Metrics)
> [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time:
> (org.apache.kafka.common.metrics.Metrics)
>
> --
> Allen Michael Chan
>


Re: Kafka encryption

2016-06-02 Thread Tom Crayford
Filesystem encryption is transparent to Kafka. You don't need to use SSL,
but your encryption requirements may cause you to need SSL as well.

With regards to compression, without adding at rest encryption to Kafka
(which is a very major piece of work, one that for sure requires a KIP and
has many, many implications), there's not much to do there. I think it's
worth examining your threat models that require encryption on disk without
full disk encryption being suitable. Generally compromised broker machines
means an attacker will be able to sniff in flight traffic anyway, if the
goal is to never leak messages even if an attacker has full control of the
broker machine, I'd suggest that that seems pretty impossible under current
operating environments.

If the issue is compliance, I'd recommend querying whichever compliance
standard you're operating under about the suitability of full disk
encryption, and careful thought about encrypting the most sensitive parts
of messages. Whilst encryption in the producer and consumer does lead to
performance issues and decrease the capability of compression to shrink a
dataset, doing partial encryption of messages is easy enough.

Generally we've found that the kinds of uses of Kafka that require in
message encryption (alongside full disk encryption and SSL which we provide
as standard) don't have such high throughput needs that they worry about
compression etc. That clearly isn't true for all use cases though.

Thanks

Tom Crayford
Heroku Kafka

On Thursday, 2 June 2016, Gerard Klijs <gerard.kl...@dizzit.com> wrote:

> You could add a header to every message, with information whether it's
> encrypted or not, then you don't have to encrypt all the messages, or you
> only do it for some topics.
>
> On Thu, Jun 2, 2016 at 6:36 AM Bruno Rassaerts <
> bruno.rassae...@novazone.be <javascript:;>>
> wrote:
>
> > It works indeed but encrypting individual messages really influences the
> > batch compression done by Kafka.
> > Performance drops to about 1/3 of what it is without (even if we prepare
> > the encrypted samples upfront).
> > In the end what we going for is only encrypting what we really really
> need
> > to encrypt, not every message systematically.
> >
> > > On 31 May 2016, at 13:00, Gerard Klijs <gerard.kl...@dizzit.com
> <javascript:;>> wrote:
> > >
> > > If you want system administrators not being able to see the data, the
> > only
> > > option is encryption, with only the clients sharing the key (or
> whatever
> > is
> > > used to (de)crypt the data). Like the example from eugene. I don't know
> > the
> > > kind of messages you have, but you could always wrap something around
> any
> > > (de)serializer your currently using.
> > >
> > > On Tue, May 31, 2016 at 12:21 PM Bruno Rassaerts <
> > > bruno.rassae...@novazone.be <javascript:;>> wrote:
> > >
> > >> I’ve asked the same question in the past, and disk encryption was
> > >> suggested as a solution as well.
> > >> However, as far as I know, disk encryption will not prevent your data
> to
> > >> be stolen when the machine is compromised.
> > >> What we are looking for is even an additional barrier, so that even
> > system
> > >> administrators do not have access to the data.
> > >> Any suggestions ?
> > >>
> > >>> On 24 May 2016, at 14:40, Tom Crayford <tcrayf...@heroku.com
> <javascript:;>> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> There's no encryption at rest. It's recommended to use filesystem
> > >>> encryption, or encryption of each individual message before producing
> > it
> > >>> for this.
> > >>>
> > >>> Only the new producer and consumers have SSL support.
> > >>>
> > >>> Thanks
> > >>>
> > >>> Tom Crayford
> > >>> Heroku Kafka
> > >>>
> > >>> On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
> > >>> snehalata.nag...@harbingergroup.com <javascript:;>> wrote:
> > >>>
> > >>>>
> > >>>>
> > >>>> Thanks for quick reply.
> > >>>>
> > >>>> Do you mean If I see messages in kafka, those will not be readable?
> > >>>>
> > >>>> And also, we are using new producer but old consumer , does old
> > consumer
> > >>>> have ssl support?
> > >>>>
> > >>>> As mentioned in d

Re: soft failure for kakfa 0.8.2.2

2016-06-01 Thread Tom Crayford
Ok. I'd recommend upgrading to 0.9 asap to fix the known bugs in 0.8 here.

Thanks

Tom Crayford
Heroku Kafka

On Wed, Jun 1, 2016 at 3:27 AM, Fredo Lee <buaatianwa...@gmail.com> wrote:

> we use 0.8.2.2.
>
> 2016-05-31 20:14 GMT+08:00 Tom Crayford <tcrayf...@heroku.com>:
>
> > Is this under 0.8? There are a few known bugs in 0.8 that can lead to
> this
> > situation. I'd recommend upgrading to 0.9 as soon as is viable to prevent
> > this and many other kinds of issues that were fixed in 0.9.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 31, 2016 at 6:19 AM, Fredo Lee <buaatianwa...@gmail.com>
> > wrote:
> >
> > > thanks for your reply.
> > >
> > > yes, there are more than one controller. the msg of "soft failure" is
> > > reported by the old controller.
> > >
> > > 2016-05-31 11:42 GMT+08:00 Muqtafi Akhmad <muqt...@traveloka.com>:
> > >
> > > > hello Fredo,
> > > >
> > > > My guess is that there was partition leader election that was might
> be
> > > > triggered by detection of offline partition in Kafka cluster. Somehow
> > the
> > > > broker try to trigger leader election while previous election has
> been
> > > > completed hence this log :
> > > >
> > > > > [2016-05-28 12:33:31,511] ERROR Controller 1008 epoch 13 initiated
> > > state
> > > > > change for partition [consup-43,78] from OnlinePartition to
> > > > > OnlinePartition failed (state.change.logger)
> > > > > kafka.common.StateChangeFailedException: encountered error while
> > > electing
> > > > > leader for partition [consup-43,78] due to: aborted leader
> > > > election
> > > > > for partition [consup-43,78]
> > > > > since the LeaderAndIsr path was already written by another
> > controller.
> > > > This
> > > > > probably means that the current controller 1008 went through a soft
> > > > failure
> > > > > and another controller was elec
> > > > > ted with epoch 14..
> > > >
> > > > The question is,
> > > > - was there any offline partition?
> > > > - was there more than one active controller?
> > > >
> > > > CMIIW
> > > >
> > > >
> > > > On Mon, May 30, 2016 at 2:41 PM, Fredo Lee <buaatianwa...@gmail.com>
> > > > wrote:
> > > >
> > > > > my server.log >>>>>>>>>>>>>>>>
> > > > >
> > > > > lots of  error msg:
> > > > >
> > > > > [2016-05-28 12:12:31,132] WARN [ReplicaFetcherThread-0-1007],
> Replica
> > > > 1008
> > > > > for partition [consup-03,16] reset its fetch offset from
> > > 13985537
> > > > > to current leader 1007's latest offset 13985537
> > > > > (kafka.server.ReplicaFetcherThread)
> > > > > [2016-05-28 12:12:31,132] ERROR [ReplicaFetcherThread-0-1007],
> > Current
> > > > > offset 13987676 for partition [consup-03,16] out of range;
> > > reset
> > > > > offset to 13985537 (kafka.server.ReplicaFetcherThread)
> > > > >
> > > > >
> > > > > the other error msg:
> > > > >
> > > > > [2016-05-28 12:12:31,708] ERROR [Replica Manager on Broker 1008]:
> > Error
> > > > > when processing fetch request for partition [consup-03,35]
> > > offset
> > > > > 13848954 from consumer with correlation id 0. Possible cause:
> Request
> > > for
> > > > > offset 13848954 but we only have log segments in the range 12356946
> > to
> > > > > 13847167. (kafka.server.ReplicaManager)
> > > > >
> > > > >
> > > > >
> > > > > 2016-05-30 15:37 GMT+08:00 Fredo Lee <buaatianwa...@gmail.com>:
> > > > >
> > > > > > My state-change.log>>>>>>>>>>>
> > > > > >
> > > > > >
> > > > > > [2016-05-28 12:33:31,510] ERROR Controller 1008 epoch 13 aborted
> > > leader
> > > > > > election for partition [consup-43,78] since the
> > LeaderAndIsr
> > > > path
> > > > > > was already written by another controll
> >

Re: move kafka from one machine to another using same broker id

2016-06-01 Thread Tom Crayford
Nope. You should upgrade to Kafka 0.9, assuming that your
ActiveControllerCount across all brokers is 0 or more than 1 (which is
typically the failure case we see).

Thanks

Tom Crayford
Heroku Kafka

On Wed, Jun 1, 2016 at 3:22 AM, Fredo Lee <buaatianwa...@gmail.com> wrote:

> we use 0.8.2.2. is this version ok?
>
> 2016-05-31 20:12 GMT+08:00 Tom Crayford <tcrayf...@heroku.com>:
>
> > Hi,
> >
> > Which version of Kafka are you running? We run thousands of clusters, and
> > typically use this mechanism for replacing damaged hardware, and we've
> only
> > seen this issue under Kafka 0.8, where the controller can get stuck (due
> to
> > a few bugs in Kafka) and not be functioning. If you are on 0.8, I'd
> > recommend moving off it as soon as possible. In the meantime, check out
> the
> > JMX bean for
> > kafka.controller:type=KafkaController,name=ActiveControllerCount on all
> > your active brokers - if it's zero, then that's an indication of this
> bug.
> >
> > Thanks
> >
> > Tom
> >
> > On Tue, May 31, 2016 at 10:11 AM, Fredo Lee <buaatianwa...@gmail.com>
> > wrote:
> >
> > > i find the new broker with old broker id always fetch message from
> itself
> > > for the reason that it believe it's the leader of some partitions.
> > >
> > > 2016-05-31 15:56 GMT+08:00 Fredo Lee <buaatianwa...@gmail.com>:
> > >
> > > > we have a kafka cluster and one of them is down for the reason of
> disk
> > > > damaged. so we use the same broker id in a new server machine.
> > > >
> > > > when start kafka in the new machine, lots of error msg: "[2016-05-31
> > > > 10:30:49,792] ERROR [ReplicaFetcherThread-0-1013], Error for
> partition
> > > > [consup-25,20] to broker 1013:class
> > > > kafka.common.NotLeaderForPartitionException
> > > > (kafka.server.ReplicaFetcherThread)
> > > > "
> > > >
> > >
> >
>


Re: soft failure for kakfa 0.8.2.2

2016-05-31 Thread Tom Crayford
Is this under 0.8? There are a few known bugs in 0.8 that can lead to this
situation. I'd recommend upgrading to 0.9 as soon as is viable to prevent
this and many other kinds of issues that were fixed in 0.9.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 31, 2016 at 6:19 AM, Fredo Lee <buaatianwa...@gmail.com> wrote:

> thanks for your reply.
>
> yes, there are more than one controller. the msg of "soft failure" is
> reported by the old controller.
>
> 2016-05-31 11:42 GMT+08:00 Muqtafi Akhmad <muqt...@traveloka.com>:
>
> > hello Fredo,
> >
> > My guess is that there was partition leader election that was might be
> > triggered by detection of offline partition in Kafka cluster. Somehow the
> > broker try to trigger leader election while previous election has been
> > completed hence this log :
> >
> > > [2016-05-28 12:33:31,511] ERROR Controller 1008 epoch 13 initiated
> state
> > > change for partition [consup-43,78] from OnlinePartition to
> > > OnlinePartition failed (state.change.logger)
> > > kafka.common.StateChangeFailedException: encountered error while
> electing
> > > leader for partition [consup-43,78] due to: aborted leader
> > election
> > > for partition [consup-43,78]
> > > since the LeaderAndIsr path was already written by another controller.
> > This
> > > probably means that the current controller 1008 went through a soft
> > failure
> > > and another controller was elec
> > > ted with epoch 14..
> >
> > The question is,
> > - was there any offline partition?
> > - was there more than one active controller?
> >
> > CMIIW
> >
> >
> > On Mon, May 30, 2016 at 2:41 PM, Fredo Lee <buaatianwa...@gmail.com>
> > wrote:
> >
> > > my server.log >>>>>>>>>>>>>>>>
> > >
> > > lots of  error msg:
> > >
> > > [2016-05-28 12:12:31,132] WARN [ReplicaFetcherThread-0-1007], Replica
> > 1008
> > > for partition [consup-03,16] reset its fetch offset from
> 13985537
> > > to current leader 1007's latest offset 13985537
> > > (kafka.server.ReplicaFetcherThread)
> > > [2016-05-28 12:12:31,132] ERROR [ReplicaFetcherThread-0-1007], Current
> > > offset 13987676 for partition [consup-03,16] out of range;
> reset
> > > offset to 13985537 (kafka.server.ReplicaFetcherThread)
> > >
> > >
> > > the other error msg:
> > >
> > > [2016-05-28 12:12:31,708] ERROR [Replica Manager on Broker 1008]: Error
> > > when processing fetch request for partition [consup-03,35]
> offset
> > > 13848954 from consumer with correlation id 0. Possible cause: Request
> for
> > > offset 13848954 but we only have log segments in the range 12356946 to
> > > 13847167. (kafka.server.ReplicaManager)
> > >
> > >
> > >
> > > 2016-05-30 15:37 GMT+08:00 Fredo Lee <buaatianwa...@gmail.com>:
> > >
> > > > My state-change.log>>>>>>>>>>>
> > > >
> > > >
> > > > [2016-05-28 12:33:31,510] ERROR Controller 1008 epoch 13 aborted
> leader
> > > > election for partition [consup-43,78] since the LeaderAndIsr
> > path
> > > > was already written by another controll
> > > > er. This probably means that the current controller 1008 went
> through a
> > > > soft failure and another controller was elected with epoch 14.
> > > > (state.change.logger)
> > > > [2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received
> > > response
> > > > UpdateMetadataResponse(1020,-1) for a request sent to broker
> > > id:1009,host:
> > > > 22.com,port:9092 (state.change.logger)
> > > > [2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received
> > > response
> > > > UpdateMetadataResponse(1066,-1) for a request sent to broker
> > > id:1001,host:
> > > > consup-kafka20.com,port:9092 (state.change.logger)
> > > > [2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received
> > > response
> > > > UpdateMetadataResponse(777,-1) for a request sent to broker
> > id:1006,host:
> > > > consup-kafka11.com,port:9092 (state.change.logger)
> > > > [2016-05-28 12:33:31,510] TRACE Controller 1008 epoch 13 received
> > > response
> > > > UpdateMetadataResponse(742,-1) for a request sent to broker
> > id:1018,host:
> 

Re: move kafka from one machine to another using same broker id

2016-05-31 Thread Tom Crayford
Hi,

Which version of Kafka are you running? We run thousands of clusters, and
typically use this mechanism for replacing damaged hardware, and we've only
seen this issue under Kafka 0.8, where the controller can get stuck (due to
a few bugs in Kafka) and not be functioning. If you are on 0.8, I'd
recommend moving off it as soon as possible. In the meantime, check out the
JMX bean for
kafka.controller:type=KafkaController,name=ActiveControllerCount on all
your active brokers - if it's zero, then that's an indication of this bug.

Thanks

Tom

On Tue, May 31, 2016 at 10:11 AM, Fredo Lee  wrote:

> i find the new broker with old broker id always fetch message from itself
> for the reason that it believe it's the leader of some partitions.
>
> 2016-05-31 15:56 GMT+08:00 Fredo Lee :
>
> > we have a kafka cluster and one of them is down for the reason of disk
> > damaged. so we use the same broker id in a new server machine.
> >
> > when start kafka in the new machine, lots of error msg: "[2016-05-31
> > 10:30:49,792] ERROR [ReplicaFetcherThread-0-1013], Error for partition
> > [consup-25,20] to broker 1013:class
> > kafka.common.NotLeaderForPartitionException
> > (kafka.server.ReplicaFetcherThread)
> > "
> >
>


Re: Is producer relying on file system?

2016-05-27 Thread Tom Crayford
Hi,

The Java producer (and as far as I'm aware all other client libraries)
don't rely on the filesystem for anything, *except* that the JVM producer
relies on the filesystem if you want to use SSL with Kafka. We're running
thousands of producers in containers (on Heroku) and have never seen any
issues.

Thanks

Tom Crayford
Heroku Kafka

On Fri, May 27, 2016 at 1:33 PM, Jan Algermissen <algermissen1...@icloud.com
> wrote:

> Hi,
>
> I have a producer question: Is the producer (specifically the normal Java
> producer) using the file system in any way?
>
> If it does so, will a producer work after loosing this file system or its
> content (for example in a containerization scenario)?
>
> Jan
>


Re: Using Multiple Kafka Producers for a single Kafka Topic

2016-05-25 Thread Tom Crayford
Generally Kafka isn't super great with a giant number of topics. I'd
recommend designing your system around a smaller number than 10k. There's
an upper limit enforced on the total number of partitions by zookeeper
anyway, somewhere around 29k.

I'd recommend having just a single producer per JVM, to reuse TCP
connections and maximize batching. There's no real benefit over having more
producers except slightly minimized lock contention. However, the limiting
factor in most Kafka based apps isn't usually anything like lock contention
on the producer - I'd expect the network to be the real limiter here.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Joe San <codeintheo...@gmail.com> wrote:

> I do not mind the ordering as I have a Timestamp in all my messages and all
> my messaged land in a Timeseries database. So I understand that it is
> better to have just one Producer instance per JVM and use that to write to
> n number of topics. I mean even if I have 10,000 topics, I can just get
> away with a single Producer instance per JVM?
>
> On Wed, May 25, 2016 at 8:41 AM, Ewen Cheslack-Postava <e...@confluent.io
> <javascript:;>>
> wrote:
>
> > On Mon, Apr 25, 2016 at 6:34 AM, Joe San <codeintheo...@gmail.com
> <javascript:;>> wrote:
> >
> > > I have an application that is currently running and is using Rx Streams
> > to
> > > move data. Now in this application, I have a couple of streams whose
> > > messages I would like to write to a single Kafka topic. Given this, I
> > have
> > > say Streams 1 to 5 as below:
> > >
> > > Stream1 - Takes in DataType A Stream2 - Takes in DataType B and so on
> > >
> > > Where these Streams are Rx Observers. All these data types that I get
> out
> > > of the stream are converted to a common JSON structure. I want this
> JSON
> > > structure to be pushed to a single Kafka topic.
> > >
> > > Now the questions are:
> > >
> > >1.
> > >
> > >Should I create one KafkaProducer for each of those Streams or
> rather
> > Rx
> > >Observer instances?
> > >
> >
> > A single producer instance is fine. In fact, it may be better since you
> > share TCP connections and requests to produce data can be batched
> together.
> >
> >
> > >2.
> > >
> > >What happens if multiple threads using its own instance of a
> > >KafkaProducer to write to the same topic?
> > >
> >
> > They can all write to the same topic, but their data will be arbitrarily
> > interleaved since there's no ordering guarantee across these producers.
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>


Re: Which should scale: Producer or Topic

2016-05-25 Thread Tom Crayford
By process I mean a JVM process (if you're using the JVM clients and for
your app).

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Hafsa Asif <hafsa.a...@matchinguu.com> wrote:

> A very good question from Joe. I have also the same question.
>
> Hafsa
>
> 2016-05-24 18:00 GMT+02:00 Joe San <codeintheo...@gmail.com <javascript:;>
> >:
>
> > Interesting discussion!
> >
> > What do you mean here by a process? Is that a thread or the JVM process?
> >
> > On Tue, May 24, 2016 at 5:49 PM, Tom Crayford <tcrayf...@heroku.com
> <javascript:;>>
> > wrote:
> >
> > > Aha, yep that helped a lot.
> > >
> > > One producer per process. There's not really a per producer topic
> limit.
> > > There's buffering and batching space, but assuming you have sufficient
> > > memory (which is by the partition, not by topic), you'll be fine.
> > >
> > > Thanks
> > >
> > > Tom Crayford
> > > Heroku Kafka
> > >
> > > On Tue, May 24, 2016 at 4:46 PM, Hafsa Asif <hafsa.a...@matchinguu.com
> <javascript:;>>
> > > wrote:
> > >
> > > > One more question:
> > > > How many topics can be easily handled by one producer?
> > > >
> > > > Hafsa
> > > >
> > > > 2016-05-24 17:39 GMT+02:00 Hafsa Asif <hafsa.a...@matchinguu.com
> <javascript:;>>:
> > > >
> > > > > Ok, let me rephrase (may be I am not using correct terms):
> > > > > Simply consider I have 2 topics, and I have both Java and NodeJS
> > client
> > > > > for Kafka.
> > > > >
> > > > > *NodeJS:*
> > > > > Is it good that I write two producers per each topic like that :
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *var producer1 = new Producer(client);producer1.on('ready',
> function
> > ()
> > > > > {});producer1.on('error', function (err) {
> > > });producer1.send(payloads,
> > > > > cb);var producer2 = new Producer(client);producer2.on('ready',
> > function
> > > > ()
> > > > > {});producer2.on('error', function (err) {
> > > });producer2.send(payloads,
> > > > > cb);*
> > > > > Or, I should create one producer for all 2 topics.
> > > > >
> > > > >
> > > > >
> > > > > *Java:*Is it good that I write two producers per each topic like
> > that :
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *private static Producer<Integer, String> producer1;private static
> > > > > Producer<Integer, String> producer2;producer1 = new Producer<>(new
> > > > > ProducerConfig(properties));producer2 = new Producer<>(new
> > > > > ProducerConfig(properties));*
> > > > >
> > > > > Or, I should create one producer for all 2 topics.
> > > > >
> > > > > Suggest your answer in the light of my estimations (10 topics and 1
> > > > > million records per each topic in next week)
> > > > >
> > > > > Best Regards,
> > > > > Hafsa
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2016-05-24 17:23 GMT+02:00 Tom Crayford <tcrayf...@heroku.com
> <javascript:;>>:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I think I'm a bit confused. When you say "one producer per topic",
> > do
> > > > you
> > > > >> mean one instance of the JVM application that's producing per
> topic?
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> Tom
> > > > >>
> > > > >> On Tue, May 24, 2016 at 4:19 PM, Hafsa Asif <
> > > hafsa.a...@matchinguu.com <javascript:;>>
> > > > >> wrote:
> > > > >>
> > > > >> > Tom,
> > > > >> >
> > > > >> > Thank you for your answer. No, I am talking about one PRODUCER
> for
> > &

Re: Kafka Scalability with the Number of Partitions

2016-05-25 Thread Tom Crayford
Hi,

Kafka's performance all comes from batching. There's going to be a huge
perf impact from limiting your batching like that, and that's likely the
issue. I'd recommend designing your system around Kafka's batching model,
which involves large numbers of messages per fetch request.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Yazeed Alabdulkarim <y.alabdulka...@gmail.com>
wrote:

> Hi Tom,
> Thank you for your help. I have only one broker. I used kafka production
> server configuration listed in kafka's documentation page:
> http://kafka.apache.org/documentation.html#prodconfig . I have increased
> the flush interval and number of messages to prevent the disk from becoming
> the bottleneck. For the consumers, I used the following configurations:
> Properties props = new Properties();
> props.put("enable.auto.commit", "true");
> props.put("request.timeout.ms", "5");
> props.put("session.timeout.ms", "5000");
> props.put("connections.max.idle.ms", "5000");
> props.put("fetch.min.bytes", 1);
> props.put("fetch.max.wait.ms", "500");
> props.put("group.id", "gid");
> props.put("key.deserializer", StringDeserializer.class.getName());
> props.put("value.deserializer", StringDeserializer.class.getName());
> props.put("max.partition.fetch.bytes", "128");
> consumer = new KafkaConsumer<String, String>(props);
>
> I am setting the max.partition.fetch.bytes to 128, because I only want to
> process one record for each poll.
>
> Thank a lot for your help. I really appreciate it.
>
> On Tue, May 24, 2016 at 7:51 AM, Tom Crayford <tcrayf...@heroku.com
> <javascript:;>> wrote:
>
> > What's your server setup for the brokers and consumers? Generally I'd
> > expect something to be exhausted here and that to end up being the
> > bottleneck.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Mon, May 23, 2016 at 7:32 PM, Yazeed Alabdulkarim <
> > y.alabdulka...@gmail.com <javascript:;>> wrote:
> >
> > > Hi,
> > > I am running simple experiments to evaluate the scalability of Kafka
> > > consumers with respect to the number of partitions. I assign every
> > consumer
> > > to a specific partition. Each consumer polls the records in its
> assigned
> > > partition and print the first one, then polls again from the offset of
> > the
> > > printed record until all records are printed. Prior to running the
> test,
> > I
> > > produce 10 Million records evenly among partitions. After running the
> > test,
> > > I measure the time it took for the consumers to print all the records.
> I
> > > was expecting Kafka to scale as I increase the number of
> > > consumers/partitions. However, the scalability diminishes as I increase
> > the
> > > number of partitions/consumers, beyond certain number. Going from
> 1,2,4,8
> > > the scalability is great as the duration of the test is reduced by the
> > > factor increase of the number of partitions/consumers. However, beyond
> 8
> > > consumers/partitions, the duration of the test reaches a steady state.
> I
> > am
> > > monitoring the resources of my server and didn't see any bottleneck.
> Am I
> > > missing something here? Shouldn't Kafka consumers scale with the number
> > of
> > > partitions?
> > > --
> > > Best Regards,
> > > Yazeed Alabdulkarim
> > >
> >
>
>
>
> --
> Best Regards,
> Yazeed Alabdulkarim
>


Re: Kafka encryption

2016-05-25 Thread Tom Crayford
If you're using EBS then it's a single flag to use encrypted drives at the
provision time of the volume. I don't know about the other storage options,
I'd recommend looking at the AWS documentation.

Thanks

Tom Crayford
Heroku Kafka

On Wednesday, 25 May 2016, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
>
> Thanks,
>
> How can we do file system encryption?
>
> we are using aws environment.
>
> Thanks,
> Snehalata
>
> - Original Message -
> From: "Gerard Klijs" <gerard.kl...@dizzit.com <javascript:;>>
> To: "Users" <users@kafka.apache.org <javascript:;>>
> Sent: Tuesday, May 24, 2016 7:26:27 PM
> Subject: Re: Kafka encryption
>
> For both old and new consumers/producers you can make your own
> (de)serializer to do some encryption, maybe that could be an option?
>
> On Tue, May 24, 2016 at 2:40 PM Tom Crayford <tcrayf...@heroku.com
> <javascript:;>> wrote:
>
> > Hi,
> >
> > There's no encryption at rest. It's recommended to use filesystem
> > encryption, or encryption of each individual message before producing it
> > for this.
> >
> > Only the new producer and consumers have SSL support.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
> > snehalata.nag...@harbingergroup.com <javascript:;>> wrote:
> >
> > >
> > >
> > > Thanks for quick reply.
> > >
> > > Do you mean If I see messages in kafka, those will not be readable?
> > >
> > > And also, we are using new producer but old consumer , does old
> consumer
> > > have ssl support?
> > >
> > > As mentioned in document, its not there.
> > >
> > >
> > > Thanks,
> > > Snehalata
> > >
> > > - Original Message -
> > > From: "Mudit Kumar" <mudit.ku...@askme.in <javascript:;>>
> > > To: users@kafka.apache.org <javascript:;>
> > > Sent: Tuesday, May 24, 2016 3:53:26 PM
> > > Subject: Re: Kafka encryption
> > >
> > > Yes,it does that.What specifically you are looking for?
> > >
> > >
> > >
> > >
> > > On 5/24/16, 3:52 PM, "Snehalata Nagaje" <
> > > snehalata.nag...@harbingergroup.com <javascript:;>> wrote:
> > >
> > > >Hi All,
> > > >
> > > >
> > > >We have requirement of encryption in kafka.
> > > >
> > > >As per docs, we can configure kafka with ssl, for secured
> communication.
> > > >
> > > >But does kafka also stores data in encrypted format?
> > > >
> > > >
> > > >Thanks,
> > > >Snehalata
> > >
> >
>


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Tom Crayford
Can I just confirm that
https://github.com/apache/kafka/commit/b8642491e78c5a137f5012e31d347c01f3b02339
is the official commit for the release? The source download doesn't have
the git repo and I can't see a sha anywhere in the downloaded source.

On Tue, May 24, 2016 at 5:42 PM, Becket Qin <becket@gmail.com> wrote:

> Awesome!
>
> On Tue, May 24, 2016 at 9:41 AM, Jay Kreps <j...@confluent.io> wrote:
>
> > Woohoo!!! :-)
> >
> > -Jay
> >
> > On Tue, May 24, 2016 at 9:24 AM, Gwen Shapira <gwens...@apache.org>
> wrote:
> >
> > > The Apache Kafka community is pleased to announce the release for
> Apache
> > > Kafka 0.10.0.0.
> > > This is a major release with exciting new features, including first
> > > release of KafkaStreams and many other improvements.
> > >
> > > All of the changes in this release can be found:
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
> > >
> > > Apache Kafka is high-throughput, publish-subscribe messaging system
> > > rethought of as a distributed commit log.
> > >
> > > ** Fast => A single Kafka broker can handle hundreds of megabytes of
> > reads
> > > and
> > > writes per second from thousands of clients.
> > >
> > > ** Scalable => Kafka is designed to allow a single cluster to serve as
> > the
> > > central data backbone
> > > for a large organization. It can be elastically and transparently
> > expanded
> > > without downtime.
> > > Data streams are partitioned and spread over a cluster of machines to
> > allow
> > > data streams
> > > larger than the capability of any single machine and to allow clusters
> of
> > > co-ordinated consumers.
> > >
> > > ** Durable => Messages are persisted on disk and replicated within the
> > > cluster to prevent
> > > data loss. Each broker can handle terabytes of messages without
> > performance
> > > impact.
> > >
> > > ** Distributed by Design => Kafka has a modern cluster-centric design
> > that
> > > offers
> > > strong durability and fault-tolerance guarantees.
> > >
> > > You can download the source release from
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
> > >
> > > and binary releases from
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
> > >
> > >
> >
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
> > >
> > > A big thank you for the following people who have contributed to the
> > > 0.10.0.0 release.
> > >
> > > Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> > > Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> > > Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> > > Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> > > Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> > > Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> > > Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> > > Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> > > Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> > > Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> > > Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> > > Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> > > Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> > > Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> > > Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> > > manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> > > McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> > > Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> > > Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> > > Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> > > Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> > > Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> > > Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> > > William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> > > Kawamura, zhuchen1018
> > >
> > > We welcome your help and feedback. For more information on how to
> > > report problems, and to get involved, visit the project website at
> > > http://kafka.apache.org/
> > >
> > > Thanks,
> > >
> > > Gwen
> > >
> >
>


Re: Which should scale: Producer or Topic

2016-05-24 Thread Tom Crayford
Aha, yep that helped a lot.

One producer per process. There's not really a per producer topic limit.
There's buffering and batching space, but assuming you have sufficient
memory (which is by the partition, not by topic), you'll be fine.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 24, 2016 at 4:46 PM, Hafsa Asif <hafsa.a...@matchinguu.com>
wrote:

> One more question:
> How many topics can be easily handled by one producer?
>
> Hafsa
>
> 2016-05-24 17:39 GMT+02:00 Hafsa Asif <hafsa.a...@matchinguu.com>:
>
> > Ok, let me rephrase (may be I am not using correct terms):
> > Simply consider I have 2 topics, and I have both Java and NodeJS client
> > for Kafka.
> >
> > *NodeJS:*
> > Is it good that I write two producers per each topic like that :
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *var producer1 = new Producer(client);producer1.on('ready', function ()
> > {});producer1.on('error', function (err) {});producer1.send(payloads,
> > cb);var producer2 = new Producer(client);producer2.on('ready', function
> ()
> > {});producer2.on('error', function (err) {});producer2.send(payloads,
> > cb);*
> > Or, I should create one producer for all 2 topics.
> >
> >
> >
> > *Java:*Is it good that I write two producers per each topic like that :
> >
> >
> >
> >
> > *private static Producer<Integer, String> producer1;private static
> > Producer<Integer, String> producer2;producer1 = new Producer<>(new
> > ProducerConfig(properties));producer2 = new Producer<>(new
> > ProducerConfig(properties));*
> >
> > Or, I should create one producer for all 2 topics.
> >
> > Suggest your answer in the light of my estimations (10 topics and 1
> > million records per each topic in next week)
> >
> > Best Regards,
> > Hafsa
> >
> >
> >
> >
> >
> > 2016-05-24 17:23 GMT+02:00 Tom Crayford <tcrayf...@heroku.com>:
> >
> >> Hi,
> >>
> >> I think I'm a bit confused. When you say "one producer per topic", do
> you
> >> mean one instance of the JVM application that's producing per topic?
> >>
> >> Thanks
> >>
> >> Tom
> >>
> >> On Tue, May 24, 2016 at 4:19 PM, Hafsa Asif <hafsa.a...@matchinguu.com>
> >> wrote:
> >>
> >> > Tom,
> >> >
> >> > Thank you for your answer. No, I am talking about one PRODUCER for
> each
> >> > topic, not one instance of same producer class. I am asking for
> general
> >> > concept only.
> >> >  Actually we are just growing and not so much far from the case of 1
> >> > million records per sec. Just considering our future case, I need your
> >> > suggestion in more detail, that in general is it a good practice to:
> >> > 1. Prepare a single producer for multiple topics (consider 10 topics)
> .
> >> > 2. Prepare 10 producers for 10 topics respectively.
> >> >
> >> > Your answer is quite satisfying for me, but I need more details so
> that
> >> I
> >> > can convince my team in a good way.
> >> >
> >> > Best Regards,
> >> > Hafsa
> >> >
> >> > 2016-05-24 16:11 GMT+02:00 Tom Crayford <tcrayf...@heroku.com>:
> >> >
> >> > > Is that "one instance of the producer class per topic"? I'd
> recommend
> >> > just
> >> > > having a single producer shared per process.
> >> > >
> >> > > 1 million records in a week is not very many records, it works down
> to
> >> > ~1.6
> >> > > records a second on average, which is nothing (we typically see 1
> >> > million+
> >> > > messages per second on our clusters). Or maybe your load is spikier
> >> than
> >> > > that?
> >> > >
> >> > > Generally if you have multiple producer instances they will fail
> >> slightly
> >> > > differently, but most failures that hit one (e.g. a broker going
> down
> >> and
> >> > > the controller not changing over the leader fast enough).
> >> > >
> >> > > Thanks
> >> > >
> >> > > Tom Crayford
> >> > > Heroku Kafka
> >> > >
> >> > > On Tue, May 24, 2016 at 3:03 PM, Hafsa Asif <
> >> hafsa.a...@matchinguu.com>
> >> > > wrote:
> >> > >
> >> > > > Hello Folks,
> >> > > >
> >> > > > I am using Kafka (0.9) in my company and it is expected that we
> are
> >> > going
> >> > > > to receive 1 million records in next week. I have many topics for
> >> > solely
> >> > > > different purposes. Is it good that I define one producer per
> topic
> >> or
> >> > > > create one producer for every topic?
> >> > > >
> >> > > > Right now, I have only 4 topics and each one is expected to
> receive
> >> 1
> >> > > > million record in next week and after 4 months, we will receive 10
> >> > > million
> >> > > > records.
> >> > > >
> >> > > >
> >> > > > Is it possible in Kafka that if one producer fails, then other
> >> producer
> >> > > > also does not work? Please also suggest the safe strategy to go.
> >> > > >
> >> > > > Best Regards,
> >> > > > Hafsa
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>


Re: Which should scale: Producer or Topic

2016-05-24 Thread Tom Crayford
Hi,

I think I'm a bit confused. When you say "one producer per topic", do you
mean one instance of the JVM application that's producing per topic?

Thanks

Tom

On Tue, May 24, 2016 at 4:19 PM, Hafsa Asif <hafsa.a...@matchinguu.com>
wrote:

> Tom,
>
> Thank you for your answer. No, I am talking about one PRODUCER for each
> topic, not one instance of same producer class. I am asking for general
> concept only.
>  Actually we are just growing and not so much far from the case of 1
> million records per sec. Just considering our future case, I need your
> suggestion in more detail, that in general is it a good practice to:
> 1. Prepare a single producer for multiple topics (consider 10 topics) .
> 2. Prepare 10 producers for 10 topics respectively.
>
> Your answer is quite satisfying for me, but I need more details so that I
> can convince my team in a good way.
>
> Best Regards,
> Hafsa
>
> 2016-05-24 16:11 GMT+02:00 Tom Crayford <tcrayf...@heroku.com>:
>
> > Is that "one instance of the producer class per topic"? I'd recommend
> just
> > having a single producer shared per process.
> >
> > 1 million records in a week is not very many records, it works down to
> ~1.6
> > records a second on average, which is nothing (we typically see 1
> million+
> > messages per second on our clusters). Or maybe your load is spikier than
> > that?
> >
> > Generally if you have multiple producer instances they will fail slightly
> > differently, but most failures that hit one (e.g. a broker going down and
> > the controller not changing over the leader fast enough).
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Tue, May 24, 2016 at 3:03 PM, Hafsa Asif <hafsa.a...@matchinguu.com>
> > wrote:
> >
> > > Hello Folks,
> > >
> > > I am using Kafka (0.9) in my company and it is expected that we are
> going
> > > to receive 1 million records in next week. I have many topics for
> solely
> > > different purposes. Is it good that I define one producer per topic or
> > > create one producer for every topic?
> > >
> > > Right now, I have only 4 topics and each one is expected to receive 1
> > > million record in next week and after 4 months, we will receive 10
> > million
> > > records.
> > >
> > >
> > > Is it possible in Kafka that if one producer fails, then other producer
> > > also does not work? Please also suggest the safe strategy to go.
> > >
> > > Best Regards,
> > > Hafsa
> > >
> >
>


Re: Precautions Before Restarting a Down Kafka Instance

2016-05-24 Thread Tom Crayford
No real precautions need to be taken on starting a down instance, assuming
you have replication in play and the controller is up and active. We
routinely restart downed broker processes, and have never had an issue with
it (running thousands of clusters with an ops team of ~3).

Thanks

Tom Crayford,
Heroku Kafka

On Mon, May 23, 2016 at 4:41 PM, Tushar Agrawal <agrawal.tus...@gmail.com>
wrote:

> I have 5 kafka brokers in production (0.9.0.0-1). 10 days back one of the
> broker went down and I could not find anything in logs or traces.
>
> *Question*: What could be the repercussions of restarting the broker? What
> precautions need to be taken before restarting the instance?
>
> Thank you,
> Tushar
>


Re: Kafka Scalability with the Number of Partitions

2016-05-24 Thread Tom Crayford
What's your server setup for the brokers and consumers? Generally I'd
expect something to be exhausted here and that to end up being the
bottleneck.

Thanks

Tom Crayford
Heroku Kafka

On Mon, May 23, 2016 at 7:32 PM, Yazeed Alabdulkarim <
y.alabdulka...@gmail.com> wrote:

> Hi,
> I am running simple experiments to evaluate the scalability of Kafka
> consumers with respect to the number of partitions. I assign every consumer
> to a specific partition. Each consumer polls the records in its assigned
> partition and print the first one, then polls again from the offset of the
> printed record until all records are printed. Prior to running the test, I
> produce 10 Million records evenly among partitions. After running the test,
> I measure the time it took for the consumers to print all the records. I
> was expecting Kafka to scale as I increase the number of
> consumers/partitions. However, the scalability diminishes as I increase the
> number of partitions/consumers, beyond certain number. Going from 1,2,4,8
> the scalability is great as the duration of the test is reduced by the
> factor increase of the number of partitions/consumers. However, beyond 8
> consumers/partitions, the duration of the test reaches a steady state. I am
> monitoring the resources of my server and didn't see any bottleneck. Am I
> missing something here? Shouldn't Kafka consumers scale with the number of
> partitions?
> --
> Best Regards,
> Yazeed Alabdulkarim
>


Re: Which should scale: Producer or Topic

2016-05-24 Thread Tom Crayford
Is that "one instance of the producer class per topic"? I'd recommend just
having a single producer shared per process.

1 million records in a week is not very many records, it works down to ~1.6
records a second on average, which is nothing (we typically see 1 million+
messages per second on our clusters). Or maybe your load is spikier than
that?

Generally if you have multiple producer instances they will fail slightly
differently, but most failures that hit one (e.g. a broker going down and
the controller not changing over the leader fast enough).

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 24, 2016 at 3:03 PM, Hafsa Asif <hafsa.a...@matchinguu.com>
wrote:

> Hello Folks,
>
> I am using Kafka (0.9) in my company and it is expected that we are going
> to receive 1 million records in next week. I have many topics for solely
> different purposes. Is it good that I define one producer per topic or
> create one producer for every topic?
>
> Right now, I have only 4 topics and each one is expected to receive 1
> million record in next week and after 4 months, we will receive 10 million
> records.
>
>
> Is it possible in Kafka that if one producer fails, then other producer
> also does not work? Please also suggest the safe strategy to go.
>
> Best Regards,
> Hafsa
>


Re: Large kafka deployment on virtual hardware

2016-05-24 Thread Tom Crayford
2-4 seems ok with me, as long as the network isn't bound or dropping
packets/etc then you're probably ok.

The new producer is this class:
https://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
- if you're using that then it's new producer, otherwise it's old producer.

Lots of leader change activity seems accurate to me to fit with offline
partitions or at least other controller instability. Did you check the GC
logs on your brokers for long GCs? How much JVM heap are you giving each
broker? What about the zookeepers, do they have enough heap, have low pause
times etc?

Another thing to think about is that 4800 partitions is a lot, and 120
brokers *is* a lot of brokers. We've seen Kafka happily handle 1.5 million
messages/s on an 8 node cluster with 16 partitions (under 0.9.0.0 though).

Another thing is that 0.8.X has a number of bugs in which the Kafka
controller can get stuck. We've been finding 0.9 to be much more stable
overall (we run thousands of clusters with an ops team of 3 people who
aren't even full time), and would recommend switching off 0.8.X asap.

Thanks

Tom



On Tue, May 24, 2016 at 2:28 PM, Jahn Roux <j...@comprsa.com> wrote:

> Hi Tom, I appreciate you taking the time to respond to my request.
>
> I believe at the moment we only have 2 to 4 virtuals running on a single
> host. This is probably not ideal, but this is what we are stuck with -
> essentially "cloud" VM hardware.
>
> I am not sure about the producer, I believe it to be the new producer -
> how would I check this?
>
> This is part of our issue at the moment - we are having trouble with the
> metrics. Our ganglia server seems overwhelmed. We set up a small test
> cluster and found replica lag to be the biggest issue. Before we lost our
> metrics I noticed a lot of leader change activity - could this be a symptom
> of the offline partitions?
>
> Kind regards,
>
> Jahn Roux
>
>
> -Original Message-
> From: Tom Crayford [mailto:tcrayf...@heroku.com]
> Sent: Tuesday, May 24, 2016 3:07 PM
> To: Users
> Subject: Re: Large kafka deployment on virtual hardware
>
> Jahn,
>
> Are all these brokers running on the same underlying machine? Doing so
> seems highly against the usual fault tolerance properties of Kafka, and I'd
> expect there to be some hidden performance issues in the hypervisor at that
> point.
>
> Are you running with the new producer or the old one?
>
> Are you monitoring Kafka's internal metrics on each broker? Issues with
> e.g. offline partitions and other things could cause that kind of impact.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>
> On Tue, May 24, 2016 at 9:56 AM, Jahn Roux <j...@comprsa.com> wrote:
>
> > Thank you for the response. Yes, we have had a number of experts
> > investigate the underlying resource provision and there are no clear
> > issues that stand out - from a virtual and host hardware/resource
> > perspective the system is busy but nothing indicates it is overburdened.
> >
> > Kind regards,
> >
> > Jahn Roux
> >
> > -Original Message-
> > From: Sharninder [mailto:sharnin...@gmail.com]
> > Sent: Tuesday, May 24, 2016 10:49 AM
> > To: users@kafka.apache.org
> > Subject: Re: Large kafka deployment on virtual hardware
> >
> > I'm sure you checked this but since these are virtual machines, is it
> > possible there is just contention for resources? Network clogged or
> > some other simpler explanation like that?
> >
> > On Mon, May 23, 2016 at 9:42 PM, Jahn Roux <j...@comprsa.com> wrote:
> >
> > > I have a large Kafka deployment on virtual hardware: 120 brokers on
> > > 32gb memory 8 core virtual machines. Gigabit network, RHEL 6.7. 4
> > > Topics, 1200 partitions each, replication factor of 2 and running
> > > Kafka 0.8.1.2
> > >
> > >
> > >
> > > We are running into issues where our cluster is not keeping up. We
> > > have 4 sets of producers (30 producers per set) set to produce to
> > > the
> > > 4 topics (producers produce to multiple topics). The messages are
> > > about 150 byte on average and we are attempting to produce between 1
> > > million and 2 million messages a second per producer set.
> > >
> > >
> > >
> > > We run into issues after about 1 million messages a second - just
> > > for that producer set, the producer buffers fill up and we are
> > > blocked from producing messages. This does not seem to impact the
> > > other producer sets - they run without issues until they too reach
> > > about 1m messages a second.
> > >
> > >
> > >

Re: Large kafka deployment on virtual hardware

2016-05-24 Thread Tom Crayford
Jahn,

Are all these brokers running on the same underlying machine? Doing so
seems highly against the usual fault tolerance properties of Kafka, and I'd
expect there to be some hidden performance issues in the hypervisor at that
point.

Are you running with the new producer or the old one?

Are you monitoring Kafka's internal metrics on each broker? Issues with
e.g. offline partitions and other things could cause that kind of impact.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 24, 2016 at 9:56 AM, Jahn Roux <j...@comprsa.com> wrote:

> Thank you for the response. Yes, we have had a number of experts
> investigate the underlying resource provision and there are no clear issues
> that stand out - from a virtual and host hardware/resource perspective the
> system is busy but nothing indicates it is overburdened.
>
> Kind regards,
>
> Jahn Roux
>
> -Original Message-
> From: Sharninder [mailto:sharnin...@gmail.com]
> Sent: Tuesday, May 24, 2016 10:49 AM
> To: users@kafka.apache.org
> Subject: Re: Large kafka deployment on virtual hardware
>
> I'm sure you checked this but since these are virtual machines, is it
> possible there is just contention for resources? Network clogged or some
> other simpler explanation like that?
>
> On Mon, May 23, 2016 at 9:42 PM, Jahn Roux <j...@comprsa.com> wrote:
>
> > I have a large Kafka deployment on virtual hardware: 120 brokers on
> > 32gb memory 8 core virtual machines. Gigabit network, RHEL 6.7. 4
> > Topics, 1200 partitions each, replication factor of 2 and running
> > Kafka 0.8.1.2
> >
> >
> >
> > We are running into issues where our cluster is not keeping up. We
> > have 4 sets of producers (30 producers per set) set to produce to the
> > 4 topics (producers produce to multiple topics). The messages are
> > about 150 byte on average and we are attempting to produce between 1
> > million and 2 million messages a second per producer set.
> >
> >
> >
> > We run into issues after about 1 million messages a second - just for
> > that producer set, the producer buffers fill up and we are blocked
> > from producing messages. This does not seem to impact the other
> > producer sets - they run without issues until they too reach about 1m
> > messages a second.
> >
> >
> >
> > Looking at the metrics available to us we do not see a bottleneck, we
> > don't see disk I/O maxing out, CPU and network are nominal. We have
> > tried increasing and decreasing the Kafka cluster size to no avail, we
> > have gone from 100 partitions to 1200 partitions per topic. We have
> > increased and decreased the number of producers and yet we run into
> > the same issues. Our Kafka config is mostly out the box - 1 hour log
> > roll/retention, increased the buffer sizes a bit but other than that
> it's out the box.
> >
> >
> >
> > I was wondering if someone has some recommendations for identifying
> > the bottleneck and/or what configuration values we should be taking a
> look at?
> > Is there known issues with Kafka on virtualized hardware or things to
> > watch out for when deploying to VMs? Are there use cases where Kafka
> > is being used in a similar way - +4 million messages a second of
> > discrete 150 byte messages?
> >
> >
> >
> > Kind regards,
> >
> >
> >
> > Jahn Roux
> >
> >
> >
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >
>
>
>
> --
> --
> Sharninder
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>


Re: Kafka encryption

2016-05-24 Thread Tom Crayford
Hi,

There's no encryption at rest. It's recommended to use filesystem
encryption, or encryption of each individual message before producing it
for this.

Only the new producer and consumers have SSL support.

Thanks

Tom Crayford
Heroku Kafka

On Tue, May 24, 2016 at 11:33 AM, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
>
> Thanks for quick reply.
>
> Do you mean If I see messages in kafka, those will not be readable?
>
> And also, we are using new producer but old consumer , does old consumer
> have ssl support?
>
> As mentioned in document, its not there.
>
>
> Thanks,
> Snehalata
>
> - Original Message -
> From: "Mudit Kumar" <mudit.ku...@askme.in>
> To: users@kafka.apache.org
> Sent: Tuesday, May 24, 2016 3:53:26 PM
> Subject: Re: Kafka encryption
>
> Yes,it does that.What specifically you are looking for?
>
>
>
>
> On 5/24/16, 3:52 PM, "Snehalata Nagaje" <
> snehalata.nag...@harbingergroup.com> wrote:
>
> >Hi All,
> >
> >
> >We have requirement of encryption in kafka.
> >
> >As per docs, we can configure kafka with ssl, for secured communication.
> >
> >But does kafka also stores data in encrypted format?
> >
> >
> >Thanks,
> >Snehalata
>


Re: Kafka Producer and Buffer Issues

2016-05-23 Thread Tom Crayford
It puts things on several internal queues. I'd benchmark what kind of rates
you're looking at - we handily do a few hundred thousand per second per
process with a 2GB JVM heap.

On Mon, May 23, 2016 at 1:31 PM, Joe San <codeintheo...@gmail.com> wrote:

> When you say Threadsafe, I assume that the calls to the send method is
> Synchronized or? If it is the case, I see this as a bottleneck. We have a
> high frequency system and the frequency at which the calls are made to the
> send method is very high. This was the reason why I came up with multiple
> instances of the producer!
>
> On Mon, May 23, 2016 at 2:22 PM, Tom Crayford <tcrayf...@heroku.com>
> wrote:
>
> > That's accurate. Why are you creating so many producers? The Kafka
> producer
> > is thread safe and *should* be shared to take advantage of batching, so
> I'd
> > recommend just having a single producer.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Mon, May 23, 2016 at 10:41 AM, Joe San <codeintheo...@gmail.com>
> wrote:
> >
> > > In one of our application, we have the following setting:
> > >
> > > # kafka configuration
> > > # ~
> > > kafka {
> > >   # comma seperated list of brokers
> > >   # for e.g., "localhost:9092,localhost:9032"
> > >   brokers = "localhost:9092,localhost:9032"
> > >   topic = "asset-computed-telemetry"
> > >   isEnabled = true
> > >   # for a detailed list of configuration options see
> > >   # under New Producer Configs
> > >   # http://kafka.apache.org/082/documentation.html#producerconfigs
> > >   requestRequiredAcks = 1
> > >   requestTimeout = 3.seconds
> > >   bufferMemoryBytes = "33554432"
> > >   blockOnBufferFull = false
> > >   # setting this to 0 indicates that the producer will never
> > >   # block and will just drop messages once the queue buffer to
> > >   # kafka broker is full
> > >   queueEnqueTimeoutMs = 0.seconds
> > >   producerType = "async"
> > >   messageSendMaxRetries = 1
> > > }
> > >
> > > As you can see from the configuration is that, we have a buffer of 33
> MB.
> > > Now I have one topic on my broker and this topic has 20 partitions. So
> > what
> > > I do in my producer application is that I create 10 instances of my
> > > Producer and I write to the topic.
> > >
> > > So each producer instance gets a copy of this configuration. Does this
> > mean
> > > that I will reserve 33 times 10 instances = 330 MB os space just for
> the
> > > buffer? What if I have more and more topics in the future? Will I use
> all
> > > the memory only for the buffer?
> > >
> >
>


Re: Kafka Producer and Buffer Issues

2016-05-23 Thread Tom Crayford
That's accurate. Why are you creating so many producers? The Kafka producer
is thread safe and *should* be shared to take advantage of batching, so I'd
recommend just having a single producer.

Thanks

Tom Crayford
Heroku Kafka

On Mon, May 23, 2016 at 10:41 AM, Joe San <codeintheo...@gmail.com> wrote:

> In one of our application, we have the following setting:
>
> # kafka configuration
> # ~
> kafka {
>   # comma seperated list of brokers
>   # for e.g., "localhost:9092,localhost:9032"
>   brokers = "localhost:9092,localhost:9032"
>   topic = "asset-computed-telemetry"
>   isEnabled = true
>   # for a detailed list of configuration options see
>   # under New Producer Configs
>   # http://kafka.apache.org/082/documentation.html#producerconfigs
>   requestRequiredAcks = 1
>   requestTimeout = 3.seconds
>   bufferMemoryBytes = "33554432"
>   blockOnBufferFull = false
>   # setting this to 0 indicates that the producer will never
>   # block and will just drop messages once the queue buffer to
>   # kafka broker is full
>   queueEnqueTimeoutMs = 0.seconds
>   producerType = "async"
>   messageSendMaxRetries = 1
> }
>
> As you can see from the configuration is that, we have a buffer of 33 MB.
> Now I have one topic on my broker and this topic has 20 partitions. So what
> I do in my producer application is that I create 10 instances of my
> Producer and I write to the topic.
>
> So each producer instance gets a copy of this configuration. Does this mean
> that I will reserve 33 times 10 instances = 330 MB os space just for the
> buffer? What if I have more and more topics in the future? Will I use all
> the memory only for the buffer?
>


Re: Kafka event logging

2016-05-23 Thread Tom Crayford
H there,

You could probably wrangle this with log4j and filters. A single broker
doesn't really have a consistent view of "if the cluster goes down", so
it'd be hard to log that, but you could write some external monitoring that
checked brokers were up via JMX and log from there.

Thanks

Tom Crayford
Heroku Kafka

On Mon, May 23, 2016 at 10:57 AM, Ghosh, Prabal Kumar <
prabal.kumar.gh...@sap.com> wrote:

> Hi,
>
> I am working on 3 node kafka broker bosh release. I want the kafka broker
> events to be saved somewhere for audit logging.
> By kafka events, I mean to say kafka connection and disconnection logs.
> Also if a node in the kafka cluster goes down, the event should be logged.
>
> Is there any plugin/configuration to do the same.
>
> Regards,
>
> Prabal Kumar Ghosh
>


Re: Will segments on no-traffic topics get deleted/compacted?

2016-05-20 Thread Tom Crayford
Hi there,

The missing piece is the config `log.roll.hours` or it's alternative `
log.roll.ms`. Log segments are by default rolled once a week, regardless of
activity, but you can tune that down as you like.

Thanks

Tom Crayford
Heroku Kafka

On Fri, May 20, 2016 at 12:49 AM, James Cheng <wushuja...@gmail.com> wrote:

> Time-based log retention only happens on old log segments. And log
> compaction only happens on old segments as well.
>
> Currently, I believe segments only roll whenever a new record is written
> to the log. That is, during the write of the new record is when the current
> segment is evaluated to see if it should be rolled. Is that true?
>
> That means that if there is *no* traffic on a topic, that the messages on
> disk may persist past the log retention time, or past the log compaction
> time. Is that the case? If so, is there any way to trigger rolling of a
> segment without active traffic on the topic?
>
> Thanks!
> -James
>
>


Re: Scalability with the Number of Partitions

2016-05-19 Thread Tom Crayford
Hi there,

Firstly, I'd recommend not running the consumers and the brokers on the
same machine. Are you running multiple brokers? If not, that'd be my first
recommendation (it sounds like you might not be).

Secondly, yes, consumers scale up with partitions. At most you can have the
same number of consumers as partitions (but each consumer can have multiple
partitions).

That error message seems pretty bad - it sounds like the broker is falling
over repeatedly. I'd check the broker logs and metrics (see
http://docs.confluent.io/2.0.1/kafka/monitoring.html).

Thanks

Tom Crayford
Heroku Kafka

On Thu, May 19, 2016 at 7:14 PM, Yazeed Alabdulkarim <
y.alabdulka...@gmail.com> wrote:

> Hi,
>
> For Kafka consumers, is it expected that the throughput will scale linearly
> as I increase the number of consumers/partitions?
>
> Also, I keep getting this info message: "Kafka Consumer Marking the
> coordinator 2147483647 dead." What is the problem? How can I fix it? My
> program continues without any problem but I am worried that it maybe
> impacting the performance. I tried to set all the config timeout parameters
> to a large number. Both Kafka consumers and server are running on the same
> machine.
>
> --
> Best Regards,
> Yazeed Alabdulkarim
>


Re: Consumer group ACL limited to new consumer API?

2016-05-19 Thread Tom Crayford
Hi there,

One way to disable the old consumer is to only allow authenticated
consumers (via SSL or another authentication system) - the old consumers
don't support authentication at all. If you care about ACLs anyway, you
probably don't want unauthenticated consumers or producers in the system at
all.

The ACL for sure only works on the new consumer API, because the old one
talks directly to zookeeper so there's no good way to apply the same ACLs
there.

Thanks

Tom Crayford
Heroku Kafka

On Thu, May 19, 2016 at 1:28 AM, David Hawes <dha...@vt.edu> wrote:

> I have been playing around with ACLs and was hoping to limit access to
> a topic and consumer group by IP, but was unable to get it working.
> Basically, I was able to Read from a topic as a consumer group that
> was not allowed.
>
> KIP-11 has the following line about consumer groups:
>
> In order to consume from a topic using the new consumer API, the
> principal will need: READ on TOPIC and READ on CONSUMER-GROUP.
>
> This tipped me off that the ACL may only work with the new consumer
> API, which I was not using. Sure enough, using the new consumer API
> denied my access by consumer group until I added an appropriate ACL.
>
> Is there some way to disable the old consumer API in Kafka? I see the
> inter.broker.protocol.version directive, but nothing about clients.
> Will there ever be support for group ACLs with the old consumer API?
>
> Without some way to disable the old consumer from being used, the
> consumer group ACLs are effectively useless as of version 0.9.0.1.
>


Re: Leader not available problem

2016-05-18 Thread Tom Crayford
Hi,

Did you check the critical Kafka metrics?
http://docs.confluent.io/2.0.1/kafka/monitoring.html has a good set of
them. We've seen a few issues with 0.8.X where the Controller gets stuck in
an infinite loop (even on single broker clusters), which would possibly
result in the case you see. Look at
kafka.controller:type=KafkaController,name=ActiveControllerCount next time.
 Putting Kafka to sleep on a laptop seems likely to cause this kind of
issue. I'd recommend either pausing the broker before sleep, or just wiping
the data when this happens (if the data is critical, it likely doesn't
belong on a laptop that goes to sleep).

Lastly, because of this and many other issues we've seen in production with
0.8.X, I'd recommend upgrading to 0.9 ASAP - in our experience (running
thousands of production clusters), 0.9 is much more stable than 0.8.

Thanks

Tom Crayford,
Heroku Kafka

On Wed, May 18, 2016 at 11:31 AM, Kamil Burzynski <ka...@nopik.net> wrote:

> Hello,
>
> I'm trying to run single Kafka broker, with few topics. Basically 1
> broker, 1 partition per topic, 1 replica, few topics. I've been using
> spotify/kafka dockerhub image which apparently just downloads Kafka
> release (0.8.2.1 in my case) and start it with default config +
> advertised host settings added.
>
> When I start Kafka like this it works fine, for a number of days.
> Occasionally, and seemingly random, it however enters some state where
> my clients are receiving LeaderNotAvailable exception, for all topics.
>
> Once Kafka server enters this state, I didn't found any way to get it
> back to healthy state. If I restart the server, it immediately works
> fine again, for few days. This is identical whether running on my
> development laptop or on Amazon's ECS service. I have feeling, that is
> happens often on my laptop when I put it to sleep (so virtualbox and
> docker inside might be affected somehow), but over past few weekssuch
> failure didnt happened, despite of daily usage and laptop sleeping.
>
> I googled a bit, it seems to happen when Kafka can't access self through
> the address specified in advertised host. I've verified that the host is
> availbale (i.e. I can connect to self using those settings), all
> dns/networking/etc seem to work fine. Like, I can docker exec to the
> docker container, and with telnet access zookeeper's 2181 or Kafka's
> 9092 ports, using the addresses from server.properties file.
>
> I also tried to run kafka-preferred-replica-election, which succeeds on
> first try and says that election process has started for all topics.
> But, thatprocess apparently does continue indefinitely, so subsequent
> executions of that command abort due to running election process.
>
> I've checked all the logs from Kafka and Zookeeper, nothing alarming
> there, either.
>
> Any idea where could I dig next?How to troubleshoot it when it will
> happens? What to check/execute?
>
> PS. While I consider myself to be relatively strong in devops area, my
> experience with Kafka is very minimal, soplease comment even on most
> novice details, as I'm likely to miss them.
>
> --
> Best regards from
> Kamil Burzynski
>
>


Re: Kafka for event sourcing architecture

2016-05-18 Thread Tom Crayford
The issue I have with that log.cleanup.policy approach is how it impacts
with replication. Different replicas often have different sets of segment
files. There are effectively two options then:
1. All replicas upload all their segments. This has issues with storage
space, but is easy to implement. It's hard to reason about from a user's
perspective though - what happens when the replicas for a partition are
reassigned? Do we upload under some namespace by broker id? That'd mean
reassigning replicas will change where things get stored in S3
2. Only the leader uploads it's segments. This presents a totally different
issue, as different brokers have different sets of segment files. To recap
a second, segment files contain a certain set of offsets, e.g. 0-10
might be in one file, then 11-20 in another. However, there's no
consistency of offset overlap between replicas. This means that Kafka's now
in charge of tracking which offset it's deleted to (in some replicated
storage), and it has to truncate files before sending them to the archive
command. Alternatively, another option is to allow multiple overlaps in
storage. This presents confusing behaviour for users when the leader
changes, as there will now be overlaps in storage. Also worth thinking
about how a cold storage proposal works with unclean leader election.

Another point I take issue with is the fact that now we're using Kafka's on
disk storage format as a long term API thing for the cold storage of data.
Clients *have* to integrate against the broker code for reading on disk
storage (which only works for JVM clients right now, furthering the already
huge amount of work it takes to support non JVM client libraries). Right
now that code *and* the storage format isn't public at all and is subject
to change (aside from normal backwards compatibility concerns with on disk
storage). With a current "a separate consumer archives to cold storage"
approach the consumer gets to specify whatever format it likes, control
batching in a way that makes sense for your cold storage system of choice
(some may prefer 1GB batches, some may prefer 100MB batches etc). Doing
this in the archiving command would be possible, but does put additional
strain on the broker machine and complicates the archiving command
significantly.

Furthermore, that rough cold storage proposal, as outlined still requires
clients to do large amounts of custom work to consume from the beginning of
time. I think avoiding much of that custom work was much of the point of
"why can't Kafka do infinite retention" thing. I can imagine a nicer to use
thing in which Kafka, upon receiving an earlier offset than the beginning
of it's queue could *also* restore the segments from cold storage one by
one, but that seems extremely tricky to implement, if much nicer in
practice.

Overall, any cold storage proposal seems like a drastic change and
complication of the role of Kafka in a system. This makes it harder to
implement and harder to run in production. It also makes it more confusing
to use as a customer (even if you have an ops team or a PaaS offering
running it for you), because now you're using a long term data store, not
an ephemeral streaming system.

I think Kafka is a great fit in an event-sourced architecture, but not as
the long term cold storage, just the messaging layer. For long term low
volume event storage, a traditional database often works very well anyway
(the control plane that powers Heroku Kafka and Heroku Postgres does
something like this with Postgres, and it's worked extremely well for the
8+ years the control plane has been running). For high volume, a consumer
archiving to S3 (or other long term storage) is relatively simple to
implement (or you can use secor which is an off the shelf dropin for most
folk).

On Wed, May 18, 2016 at 12:19 PM, Christian Posta  wrote:

> So Kafka is a fine solution as part of an event-sourced story. It's not a
> simple solution, but it fits.
>
> Kakfa can store data for a long time and you shouldn't discount this,
> however, using it as the primary long-term data store might not be a good
> fit if we're talking storing raw events for years and years. I think
> someone in another thread mentioned moving these raw events to S3 or some
> object store which is probably a good idea. Kafka can store the "head" of
> the raw event stream as well as "snapshots" in time of the raw event stream
> that your applications can consume. If a raw event stream ends up super
> big, it may be impractical for an application to consume it from he
> beginning of time to recreate its datastore anyway; thus
> snapshots/aggregate snapshots help with this. Years and years of raw events
> may be useful for batch analytics (spark/hadoop) as well.
>
> In terms of an event store, I quite like the idea of using raw events to
> generate the projections/materialized views of domain objects AND using
> events within those domain objects to 

Re: Increase number of topic in Kafka leads zookeeper fail

2016-05-17 Thread Tom Crayford
Hi,

On Monday, 16 May 2016, Abhaya P <abhaya...@gmail.com> wrote:

> I was reading a nice summary article by Jun Rao on the implications of # of
> topics/partitions.
>
> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
>
> There are many trade-offs to be considered, as it looks.
>
> Finding the partition for a key: Can 'custom partitioner' be employed so
> that a consumer can derive(compute) a partition id from a key and access it
> directly instead of scanning all the partitions in a topic?


Yep. Indeed, the Java producer has a partitioner that does just this.


>
> A thought:
> Using a partition in place of a topic per device could, optionally, provide
> one level of hierarchy with some manageability options in that a topic can
> be for a certain type of IOT or certain geography of IOT, or ... etc.


Using a partition per device will have the same issue most likely. Most of
Kafka's zookeeper storage that is likely causing issues is partition based.

Instead I'd recommend hashing keys per device across partitions. This is a
common use case, and common solution and has worked out well in production
for years for many companies. Partition or Topic per device is not a thing
that will work well, and definitely will never work on indefinite growth,
as zookeeper puts a max limit on the number of partitions in a Kafka
cluster.

Thanks

Tom Crayford
Heroku Kafka


>
> Thanks,
> Abhaya
>
>
>
> On Mon, May 16, 2016 at 5:30 AM, Paolo Patierno <ppatie...@live.com
> <javascript:;>> wrote:
>
> > I agree with Tom but ...
> >
> > ... to reply Christina question I guess that Anas thought about this kind
> > of solution in relation to the simplicity to read data from a specific
> > devices from a backend service point of view.
> >
> > Using one topic per device means that on the backend side you know
> exactly
> > which is the topic to read to get data from device X.
> > When you start using topic with more partitions and a key for device to
> > determine the partition destination, on the backend side you have to read
> > the entire topic (with data across different device) and get information
> > about the key to understand which is the device sender.
> >
> > I'm only guessing that it could be the reason ...
> >
> > Btw ... +1 what Tom said
> >
> > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> > Twitter : @ppatierno
> > Linkedin : paolopatierno
> > Blog : DevExperience
> >
> > > Date: Mon, 16 May 2016 05:26:31 -0700
> > > Subject: Re: Increase number of topic in Kafka leads zookeeper fail
> > > From: christian.po...@gmail.com <javascript:;>
> > > To: users@kafka.apache.org <javascript:;>
> > >
> > > +1 what Tom said.
> > >
> > > Curious though Anas, what motivated you to try a topic per device? was
> > > there something regarding management or security that you believe you
> can
> > > achieve with topic per device?
> > >
> > > On Mon, May 16, 2016 at 4:11 AM, Tom Crayford <tcrayf...@heroku.com
> <javascript:;>>
> > wrote:
> > >
> > > > Hi there,
> > > >
> > > > Generally you don't use a single topic per device in this use case,
> > but one
> > > > topic with some number of partitions and the key distribution based
> on
> > > > device id. Kafka isn't designed for millions of low volume topics,
> but
> > a
> > > > few high volume ones.
> > > >
> > > > Thanks
> > > >
> > > > Tom Crayford
> > > > Heroku Kafka
> > > >
> > > > On Mon, May 16, 2016 at 5:23 AM, Anas A <anas.2...@gmail.com
> <javascript:;>> wrote:
> > > >
> > > > > We plan to use kafka as a message broker for IoT use case, where
> each
> > > > > device is considered as unique topic. when I simulated 10 message
> per
> > > > > second to 10 thousand topics zookeeper is getting bottle neck,all
> > Kafka
> > > > > monitoring tools fails to read the throughput values and number of
> > topics
> > > > > from JMX port because of that. will tuning zookeeper will solve the
> > > > issues.
> > > > > where In IoT use case there will be millions of device polling data
> > to
> > > > > millions of topics. I want to make sure the approach is perfect to
> > go.
> > > > > Please suggest.
> > > > >
> > > > >
> > > > > *Thanks & Regards,*
> > > > >
> > > > >
> > > > > Anas A
> > > > > DBA, Trinity Mobility
> > > > > [image: facebook] <https://www.facebook.com/anas.24aj> [image:
> > twitter]
> > > > > <https://twitter.com/anas24aj> [image: linkedin]
> > > > > <http://in.linkedin.com/in/anas24aj> [image: googleplus]
> > > > > <https://plus.google.com/u/0/+anasA24aj/>
> > > > > +917736368236
> > > > > anas.2...@gmail.com <javascript:;>
> > > > > Bangalore
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Christian Posta*
> > > twitter: @christianposta
> > > http://www.christianposta.com/blog
> > > http://fabric8.io
> >
> >
>


Re: Kafka for event sourcing architecture

2016-05-17 Thread Tom Crayford
ng the code for it and getting it handling failure correctly
would likely be a lot of work (there's nothing in any of the client
libraries like this, because it is not a desirable or supported use case).

Instead I'd like to query *why* you need those semantics? What's the issue
with just producing a message and telling the user HTTP 200 and later
consuming it.



>
> PS: I'm a Node.js/Go developer so when possible please avoid Java centric
> terminology.


Please to note that the node and go clients are notably less mature than
the JVM clients, and that running Kafka in production means knowing enough
about the JVM and Zookeeper to handle that.

Thanks!
Tom Crayford
Heroku Kafka

>
> Thanks!
>
> - Oli
>
> --
> - Oli
>
> Olivier Lalonde
> http://www.syskall.com <-- connect with me!
>


Re: mbeans missing in 0.9.0.1?

2016-05-16 Thread Tom Crayford
Hi there,

The Kafka producer and consumer are libraries you run inside your
application. As such, the beans from them do not exist on the brokers.

Thanks
Tom Crayford,
Heroku Kafka

On Mon, May 16, 2016 at 8:29 PM, Russ Lavoie <russlav...@gmail.com> wrote:

> I am using JMX to gather kafka metrics.  It states here
> http://docs.confluent.io/1.0/kafka/monitoring.html that they should be
> there.  But when I run a jmx client and show beans kafka.consumer and
> kafka.producer do not exist.  Is there something special I have to do to
> get these metrics?
>
> Thanks
>


Re: Producer offset commit API

2016-05-16 Thread Tom Crayford
Hi,

Producers don't track offsets in the same way, so there is no producer
offset API.

Thanks

Tom Crayford
Heroku Kafka

On Mon, May 16, 2016 at 5:25 PM, Kanagha <er.kana...@gmail.com> wrote:

> Hi,
>
> I am trying to find out the API for committing producer offset for Kafka
>
> I found this example:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka
>
> Which would work for committing consumer offsets. Is there a separate API
> for committing producer offset using Kafka connect?
>
> Thanks
> Kanagha
>
>
> --
> Kanagha
>


Re: Increase number of topic in Kafka leads zookeeper fail

2016-05-16 Thread Tom Crayford
Hi there,

Generally you don't use a single topic per device in this use case, but one
topic with some number of partitions and the key distribution based on
device id. Kafka isn't designed for millions of low volume topics, but a
few high volume ones.

Thanks

Tom Crayford
Heroku Kafka

On Mon, May 16, 2016 at 5:23 AM, Anas A <anas.2...@gmail.com> wrote:

> We plan to use kafka as a message broker for IoT use case, where each
> device is considered as unique topic. when I simulated 10 message per
> second to 10 thousand topics zookeeper is getting bottle neck,all Kafka
> monitoring tools fails to read the throughput values and number of topics
> from JMX port because of that. will tuning zookeeper will solve the issues.
> where In IoT use case there will be millions of device polling data to
> millions of topics. I want to make sure the approach is perfect to go.
> Please suggest.
>
>
> *Thanks & Regards,*
>
>
> Anas A
> DBA, Trinity Mobility
> [image: facebook] <https://www.facebook.com/anas.24aj> [image: twitter]
> <https://twitter.com/anas24aj> [image: linkedin]
> <http://in.linkedin.com/in/anas24aj> [image: googleplus]
> <https://plus.google.com/u/0/+anasA24aj/>
> +917736368236
> anas.2...@gmail.com
> Bangalore
>


Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Tom Crayford
Yep, confirm.

On Thu, May 12, 2016 at 9:37 PM, Gwen Shapira <g...@confluent.io> wrote:

> Just to confirm:
> You tested both versions with plain text and saw no performance drop?
>
>
> On Thu, May 12, 2016 at 1:26 PM, Tom Crayford <tcrayf...@heroku.com>
> wrote:
> > We've started running our usual suite of performance tests against Kafka
> > 0.10.0.0 RC. These tests orchestrate multiple consumer/producer machines
> to
> > run a fairly normal mixed workload of producers and consumers (each
> > producer/consumer are just instances of kafka's inbuilt consumer/producer
> > perf tests). We've found about a 33% performance drop in the producer if
> > TLS is used (compared to 0.9.0.1)
> >
> > We've seen notable producer performance degredations between 0.9.0.1 and
> > 0.10.0.0 RC. We're running as of the commit 9404680 right now.
> >
> > Our specific test case runs Kafka on 8 EC2 machines, with enhanced
> > networking. Nothing is changed between the instances, and I've reproduced
> > this over 4 different sets of clusters now. We're seeing about a 33%
> > performance drop between 0.9.0.1 and 0.10.0.0 as of commit 9404680.
> Please
> > to note that this doesn't match up with
> > https://issues.apache.org/jira/browse/KAFKA-3565, because our
> performance
> > tests are with compression off, and this seems to be an TLS only issue.
> >
> > Under 0.10.0-rc4, we see an 8 node cluster with replication factor of 3,
> > and 13 producers max out at around 1 million 100 byte messages a second.
> > Under 0.9.0.1, the same cluster does 1.5 million messages a second. Both
> > tests were with TLS on. I've reproduced this on multiple clusters now (5
> or
> > so of each version) to account for the inherent performance variance of
> > EC2. There's no notable performance difference without TLS on these runs
> -
> > it appears to be an TLS regression entirely.
> >
> > A single producer with TLS under 0.10 does about 75k messages/s. Under
> > 0.9.0.01 it does around 120k messages/s.
> >
> > The exact producer-perf line we're using is this:
> >
> > bin/kafka-producer-perf-test --topic "bench" --num-records "5"
> > --record-size "100" --throughput "100" --producer-props acks="-1"
> > bootstrap.servers=REDACTED ssl.keystore.location=client.jks
> > ssl.keystore.password=REDACTED ssl.truststore.location=server.jks
> > ssl.truststore.password=REDACTED
> > ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1 security.protocol=SSL
> >
> > We're using the same setup, machine type etc for each test run.
> >
> > We've tried using both 0.9.0.1 producers and 0.10.0.0 producers and the
> TLS
> > performance impact was there for both.
> >
> > I've glanced over the code between 0.9.0.1 and 0.10.0.0 and haven't seen
> > anything that seemed to have this kind of impact - indeed the TLS code
> > doesn't seem to have changed much between 0.9.0.1 and 0.10.0.0.
> >
> > Any thoughts? Should I file an issue and see about reproducing a more
> > minimal test case?
> >
> > I don't think this is related to
> > https://issues.apache.org/jira/browse/KAFKA-3565 - that is for
> compression
> > on and plaintext, and this is for TLS only.
>


Re: [VOTE] 0.10.0.0 RC4

2016-05-12 Thread Tom Crayford
We've started running our usual suite of performance tests against Kafka
0.10.0.0 RC. These tests orchestrate multiple consumer/producer machines to
run a fairly normal mixed workload of producers and consumers (each
producer/consumer are just instances of kafka's inbuilt consumer/producer
perf tests). We've found about a 33% performance drop in the producer if
TLS is used (compared to 0.9.0.1)

We've seen notable producer performance degredations between 0.9.0.1 and
0.10.0.0 RC. We're running as of the commit 9404680 right now.

Our specific test case runs Kafka on 8 EC2 machines, with enhanced
networking. Nothing is changed between the instances, and I've reproduced
this over 4 different sets of clusters now. We're seeing about a 33%
performance drop between 0.9.0.1 and 0.10.0.0 as of commit 9404680. Please
to note that this doesn't match up with
https://issues.apache.org/jira/browse/KAFKA-3565, because our performance
tests are with compression off, and this seems to be an TLS only issue.

Under 0.10.0-rc4, we see an 8 node cluster with replication factor of 3,
and 13 producers max out at around 1 million 100 byte messages a second.
Under 0.9.0.1, the same cluster does 1.5 million messages a second. Both
tests were with TLS on. I've reproduced this on multiple clusters now (5 or
so of each version) to account for the inherent performance variance of
EC2. There's no notable performance difference without TLS on these runs -
it appears to be an TLS regression entirely.

A single producer with TLS under 0.10 does about 75k messages/s. Under
0.9.0.01 it does around 120k messages/s.

The exact producer-perf line we're using is this:

bin/kafka-producer-perf-test --topic "bench" --num-records "5"
--record-size "100" --throughput "100" --producer-props acks="-1"
bootstrap.servers=REDACTED ssl.keystore.location=client.jks
ssl.keystore.password=REDACTED ssl.truststore.location=server.jks
ssl.truststore.password=REDACTED
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1 security.protocol=SSL

We're using the same setup, machine type etc for each test run.

We've tried using both 0.9.0.1 producers and 0.10.0.0 producers and the TLS
performance impact was there for both.

I've glanced over the code between 0.9.0.1 and 0.10.0.0 and haven't seen
anything that seemed to have this kind of impact - indeed the TLS code
doesn't seem to have changed much between 0.9.0.1 and 0.10.0.0.

Any thoughts? Should I file an issue and see about reproducing a more
minimal test case?

I don't think this is related to
https://issues.apache.org/jira/browse/KAFKA-3565 - that is for compression
on and plaintext, and this is for TLS only.


  1   2   >