Re: Some general questions...

2016-12-05 Thread Gwen Shapira
Confluent already supports a C client (the famous librdkafka). We are
indeed going to support a C# client, based on rdkafka-dotnet - we are
currently busy modifying the API a bit to fit our taste better :)



On Mon, Dec 5, 2016 at 6:34 PM, Tauzell, Dave
 wrote:
> I don't know if any API to stream a message.  I don't suggest putting lots of 
> large messages onto Kafka.
>
> As far as documentation I hear that confluent is going to support a C and C# 
> client so you could try asking questions on the confluent mailing list.
>
> Dave
>
> On Dec 5, 2016, at 17:51, Doyle, Keith 
> > wrote:
>
>
> We're beginning to make use of Kafka, and it is encouraging.  But there are a 
> couple of questions I've had a hard time finding answers for.
>
> We're using the rdkafka-dotnet client on the consumer side and it's 
> straightforward as far as it goes.  However, documentation seems to be 
> scant-the Wiki points to a FAQ which has, like, two questions neither of 
> which are the questions we have.   And I can't find a mailing list, forum, 
> blog, or other community where questions can be asked.  I found some 
> indication in the Git repository that there may be some API docs, but it's 
> not at all clear exactly where those are.
>
> So I'm posting that question here because I can't find anywhere else that 
> might be even remotely relevant to post it-where can I find out more info 
> about rdkafka and particularly rdkafka-dotnet, and some way to ask questions 
> that aren't answered in the documentation?
>
> And second, my current question about rdkafka-dotnet, is the example 
> consumers both seem to read an entire message into memory.   We don't want to 
> presume any particular message size, and may not want to cache the entire 
> message in memory while processing it.   Is there an interface where we can 
> consume messages via a stream, so that we can read chunks of a message and 
> process them based on some kind of batch size that will allow us better 
> control over memory usage?
>
>
> Thanks,
>
>
> --
>
> [Greenway_Health_PNG_large_NO_tag]
>
> Keith Doyle  |  Senior Software Engineer
> Greenway Health  |  4301 W. Boy Scout Blvd., Suite 800, Tampa, FL 33607
> (702) 256-9911 office  |  GreenwayHealth.com
> [cid:image003.png@01D04086.868CBCB0][cid:image004.png@01D04086.868CBCB0][cid:image005.png@01D04086.868CBCB0]
>
> NOTICE: This e-mail message and all attachments transmitted with it may 
> contain legally privileged and confidential information intended solely for 
> the use of the addressee. If the reader of this message is not the intended 
> recipient, you are hereby notified that any reading, dissemination, 
> distribution, copying, or other use of this message or its attachments is 
> strictly prohibited. If you have received this message in error, please 
> notify the sender immediately by electronic mail and delete this message and 
> all copies and backups thereof. Thank you. Greenway Health.
> This e-mail and any files transmitted with it are confidential, may contain 
> sensitive information, and are intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this e-mail in 
> error, please notify the sender by reply e-mail immediately and destroy all 
> copies of the e-mail and any attachments.



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Users group in quota

2016-12-05 Thread Jiayue Zhang (Bravo)
I want to set quotas configuration for my Kafka 0.10.1 cluster but I have
some questions.

1. How to set 'user' group? From documentation, 'In a cluster that supports
unauthenticated clients, user principal is a grouping of unauthenticated
users chosen by the broker using a configurable PrincipalBuilder'. Is this
something I can control? The cluster doesn't have authentication settings.

2. The producers don't have client-id set up. Will Kafka differentiate them
through IP/Port or all of them will share the '/config/clients/'
quota?

3. We are using simple consumer API and don't set up client-id. Will Kafka
differentiate them through IP/Port or all of them will share the
'/config/clients/' quota?

Thanks,
Bravo


Re: Some general questions...

2016-12-05 Thread Tauzell, Dave
I don't know if any API to stream a message.  I don't suggest putting lots of 
large messages onto Kafka.

As far as documentation I hear that confluent is going to support a C and C# 
client so you could try asking questions on the confluent mailing list.

Dave

On Dec 5, 2016, at 17:51, Doyle, Keith 
> wrote:


We're beginning to make use of Kafka, and it is encouraging.  But there are a 
couple of questions I've had a hard time finding answers for.

We're using the rdkafka-dotnet client on the consumer side and it's 
straightforward as far as it goes.  However, documentation seems to be 
scant-the Wiki points to a FAQ which has, like, two questions neither of which 
are the questions we have.   And I can't find a mailing list, forum, blog, or 
other community where questions can be asked.  I found some indication in the 
Git repository that there may be some API docs, but it's not at all clear 
exactly where those are.

So I'm posting that question here because I can't find anywhere else that might 
be even remotely relevant to post it-where can I find out more info about 
rdkafka and particularly rdkafka-dotnet, and some way to ask questions that 
aren't answered in the documentation?

And second, my current question about rdkafka-dotnet, is the example consumers 
both seem to read an entire message into memory.   We don't want to presume any 
particular message size, and may not want to cache the entire message in memory 
while processing it.   Is there an interface where we can consume messages via 
a stream, so that we can read chunks of a message and process them based on 
some kind of batch size that will allow us better control over memory usage?


Thanks,


--

[Greenway_Health_PNG_large_NO_tag]

Keith Doyle  |  Senior Software Engineer
Greenway Health  |  4301 W. Boy Scout Blvd., Suite 800, Tampa, FL 33607
(702) 256-9911 office  |  GreenwayHealth.com
[cid:image003.png@01D04086.868CBCB0][cid:image004.png@01D04086.868CBCB0][cid:image005.png@01D04086.868CBCB0]

NOTICE: This e-mail message and all attachments transmitted with it may contain 
legally privileged and confidential information intended solely for the use of 
the addressee. If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, or 
other use of this message or its attachments is strictly prohibited. If you 
have received this message in error, please notify the sender immediately by 
electronic mail and delete this message and all copies and backups thereof. 
Thank you. Greenway Health.
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


Some general questions...

2016-12-05 Thread Doyle, Keith

We're beginning to make use of Kafka, and it is encouraging.  But there are a 
couple of questions I've had a hard time finding answers for.

We're using the rdkafka-dotnet client on the consumer side and it's 
straightforward as far as it goes.  However, documentation seems to be 
scant-the Wiki points to a FAQ which has, like, two questions neither of which 
are the questions we have.   And I can't find a mailing list, forum, blog, or 
other community where questions can be asked.  I found some indication in the 
Git repository that there may be some API docs, but it's not at all clear 
exactly where those are.

So I'm posting that question here because I can't find anywhere else that might 
be even remotely relevant to post it-where can I find out more info about 
rdkafka and particularly rdkafka-dotnet, and some way to ask questions that 
aren't answered in the documentation?

And second, my current question about rdkafka-dotnet, is the example consumers 
both seem to read an entire message into memory.   We don't want to presume any 
particular message size, and may not want to cache the entire message in memory 
while processing it.   Is there an interface where we can consume messages via 
a stream, so that we can read chunks of a message and process them based on 
some kind of batch size that will allow us better control over memory usage?


Thanks,


--

[Greenway_Health_PNG_large_NO_tag]

Keith Doyle  |  Senior Software Engineer
Greenway Health  |  4301 W. Boy Scout Blvd., Suite 800, Tampa, FL 33607
(702) 256-9911 office  |  GreenwayHealth.com
[cid:image003.png@01D04086.868CBCB0][cid:image004.png@01D04086.868CBCB0][cid:image005.png@01D04086.868CBCB0]

NOTICE: This e-mail message and all attachments transmitted with it may contain 
legally privileged and confidential information intended solely for the use of 
the addressee. If the reader of this message is not the intended recipient, you 
are hereby notified that any reading, dissemination, distribution, copying, or 
other use of this message or its attachments is strictly prohibited. If you 
have received this message in error, please notify the sender immediately by 
electronic mail and delete this message and all copies and backups thereof. 
Thank you. Greenway Health.


Re: Is CreateTopics and DeleteTopics ready for production usage?

2016-12-05 Thread Apurva Mehta
It isn't ready yet. It is part of the work related to
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations

Thanks,
Apurva

On Mon, Dec 5, 2016 at 11:10 AM, Dmitry Lazurkin  wrote:

> Hello.
>
> Are requests CreateTopics and DeleteTopics ready for production usage?
>
> Why TopicCommand doesn't use CreateTopics / DeleteTopics?
>
> Thanks.
>
>


Kafka connect distribute start failed

2016-12-05 Thread Will Du
Hi folks,
I try to start the kafka connect in the distribute ways as follows. It has 
below error. Standalone mode is fine. It happens on the 3.0.1. and 3.1 version 
of confluent kafka. Des anyone know the cause of this error?
Thanks,
Will

security.protocol = PLAINTEXT
internal.key.converter = class 
org.apache.kafka.connect.json.JsonConverter
access.control.allow.methods =
ssl.keymanager.algorithm = SunX509
metrics.sample.window.ms = 3
 (org.apache.kafka.connect.runtime.distributed.DistributedConfig:178)
[2016-12-05 21:24:14,457] INFO Logging initialized @991ms 
(org.eclipse.jetty.util.log:186)
Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to 
construct kafka consumer
at 
org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.(WorkerGroupMember.java:125)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.(DistributedHerder.java:148)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.(DistributedHerder.java:130)
at 
org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:84)
Caused by: java.lang.NoSuchMethodError: 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.(Lorg/apache/kafka/clients/consumer/internals/ConsumerNetworkClient;Ljava/lang/String;IIILorg/apache/kafka/common/metrics/Metrics;Ljava/lang/String;Lorg/apache/kafka/common/utils/Time;J)V
at 
org.apache.kafka.connect.runtime.distributed.WorkerCoordinator.(WorkerCoordinator.java:77)
at 
org.apache.kafka.connect.runtime.distributed.WorkerGroupMember.(WorkerGroupMember.java:105)
... 3 more

Re: When using mirrormaker, how are people creating topics?

2016-12-05 Thread Todd Palino
For most of our clusters, we just use auto topic creation and it’s handled
that way. Periodically we’ll go through and clean up partition counts
across everything if there’s a new high-volume topic. We also have the
ability for people to pre-create topics using a central management system.

For the special mirror maker that we have that does 1-to-1 mappings between
partitions for clusters that do not have auto topic creation enabled, the
topic creation (or partition count changes) are taken care of in the
message handler.

-Todd


On Mon, Dec 5, 2016 at 12:42 AM, James Cheng  wrote:

> Hi,
>
> We are using mirrormaker to mirror topics from one cluster to another, and
> I wanted to get some advice from the community on how people are doing
> mirroring. In particular, how are people dealing with topic creation?
>
> Do you turn on auto-topic creation in your destination clusters
> (auto.create.topics.enable=true)?
>
> If not, do you manually create the individual destination topics?
>
> If so, how does that work with mirroring based on a whitelist (regex)?
>
> The way we are doing it right now is, we have our regex in a file
> somewhere. The regex is used in 2 ways:
> 1) Passed to mirrormaker, to do the mirroring.
> 2) Passed to a program which looks up all the topics on the source
> cluster, finds the ones that match the regex, and then creates them on the
> destination cluster. (We have auto-topic creation turned off
> auto.create.topics.enable=false)
>
> One downside of that approach is there a potential race, where if the
> regex changes, then mirrormaker (in #1) might start trying to produce to a
> new destination topic before the topic was created (by #2).
>
> Some other hand-wavy ideas that came to mind might be:
> * handling topic creation in a MirrorMakerMessageHandler
> * handling topic creation in an interceptor
>
> Anyway, was hoping to get some thoughts from people who are already doing
> this.
>
> Thanks!
> -James
>
>


-- 
*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming



linkedin.com/in/toddpalino


Topic discovery when supporting multiple kafka clusters

2016-12-05 Thread Yifan Ying
Hi,

Initially, we have only one Kafka cluster shared across all teams. But now
this cluster is very close to out of resources (disk space, # of
partitions, etc.). So we are considering adding another Kafka cluster. But
what's the best practice of topic discovery, so that applications know
which cluster their topics live? We have been using Zookeeper for service
discovery, maybe it's also good for this purpose?

Thanks

-- 
Yifan


Is CreateTopics and DeleteTopics ready for production usage?

2016-12-05 Thread Dmitry Lazurkin

Hello.

Are requests CreateTopics and DeleteTopics ready for production usage?

Why TopicCommand doesn't use CreateTopics / DeleteTopics?

Thanks.



Re: Attempting to put a clean entry for key [...] into NamedCache [...] when it already contains a dirty entry for the same key

2016-12-05 Thread Damian Guy
Hi Mathieu,

if you are happy to share your code privately it would help.  At the moment
i'm struggling to see how we can get into this situation, so i think your
topology would be useful.

Thanks,
Damian

On Mon, 5 Dec 2016 at 16:34 Mathieu Fenniak 
wrote:

> Hi Damian,
>
> Yes... I can see how most of the stack trace is rather meaningless.
> Unfortunately I don't have a minimal test case, and I don't want to burden
> you by dumping the entire application.  (I could share it privately, if
> you'd like.)
>
> Based upon the stack trace, the relevant pieces involved are a multi-table
> join (KTable [leftJoin KTable]*20) that accumulates pieces of data into a
> map; one of the joined tables is an aggregation (KTable -> filter ->
> groupBy -> aggregate "TimesheetNonBillableHours") that would have the
> affected cache.
>
> Mathieu
>
>
> On Mon, Dec 5, 2016 at 8:36 AM, Damian Guy  wrote:
>
> > Hi Mathieu,
> >
> > I'm trying to make sense of the rather long stack trace in the gist you
> > provided. Can you possibly share your streams topology with us?
> >
> > Thanks,
> > Damian
> >
> > On Mon, 5 Dec 2016 at 14:14 Mathieu Fenniak <
> mathieu.fenn...@replicon.com>
> > wrote:
> >
> > > Hi Eno,
> > >
> > > This exception occurred w/ trunk @ e43bbce (current as-of Saturday).  I
> > was
> > > bit by KAFKA-4311 (I believe) when trying to upgrade to 0.10.1.0, so
> with
> > > that issue now resolved I thought I'd check trunk out to see if any
> other
> > > issues remain.
> > >
> > > Mathieu
> > >
> > >
> > > On Sun, Dec 4, 2016 at 12:37 AM, Eno Thereska 
> > > wrote:
> > >
> > > > Hi Mathieu,
> > > >
> > > > What version of Kafka are you using? There was recently a fix that
> went
> > > > into trunk, just checking if you're using an older version.
> > > > (to make forward progress you can turn the cache off, like this:
> > > > streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_
> > BUFFERING_CONFIG,
> > > > 0);
> > > > )
> > > >
> > > > Thanks
> > > > Eno
> > > > > On 4 Dec 2016, at 03:47, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com
> > > >
> > > > wrote:
> > > > >
> > > > > Hey all,
> > > > >
> > > > > I've just been running a quick test of my kafka-streams application
> > on
> > > > the
> > > > > latest Kafka trunk (@e43bbce), and came across this error.  I was
> > > > wondering
> > > > > if anyone has seen this error before, have any thoughts on what
> might
> > > > cause
> > > > > it, or can suggest a direction to investigate it further.
> > > > >
> > > > > Full exception:
> > > > > https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> > > > >
> > > > > java.lang.IllegalStateException: Attempting to put a clean entry
> for
> > > key
> > > > > [urn:replicon-tenant:strprc971e3ca9:timesheet:
> > 97c0ce25-e039-4e8b-9f2c-
> > > > d43f0668b755]
> > > > > into NamedCache [0_0-TimesheetNonBillableHours] when it already
> > > > contains a
> > > > > dirty entry for the same key
> > > > > at
> > > > > org.apache.kafka.streams.state.internals.NamedCache.
> > > > put(NamedCache.java:124)
> > > > > at
> > > > > org.apache.kafka.streams.state.internals.ThreadCache.
> > > > put(ThreadCache.java:120)
> > > > > at
> > > > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > > > CachingKeyValueStore.java:146)
> > > > > at
> > > > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > > > CachingKeyValueStore.java:133)
> > > > > at
> > > > > org.apache.kafka.streams.kstream.internals.KTableAggregate$
> > > > KTableAggregateValueGetter.get(KTableAggregate.java:128)
> > > > > at
> > > > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > > > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:81)
> > > > > at
> > > > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > > > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:54)
> > > > > at
> > > > > org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> > > > ProcessorNode.java:82)
> > > > > ... more ...
> > > >
> > > >
> > >
> >
>


Re: Understanding windowed aggregations

2016-12-05 Thread Matthias J. Sax
By default, Kafka Streams uses *event-time* and not *system-time* to
assign records to windows. That's why you observe this.

Please have look here and follow up if you have further question:

http://docs.confluent.io/current/streams/concepts.html#time


-Matthias


On 12/5/16 8:42 AM, Jon Yeargers wrote:
> Im creating aggregated values as follows:
> 
> kStream.groupByKey.aggregate( ... ,TimeWindows.of(20 * 60 *
> 1000L).advanceBy(60 * 1000L), ...);
> 
> As I process each aggregate Im storing the current system clock time in the
> aggregated record.
> 
> Im watching the aggregates come through with a subsequent '.forEach()'.
> 
> My assumption would be that an aggregate would occur when the time for a
> new value falls between the start and end of a given window. Instead Im
> seeing all values happen outside the expected range (Windowed.start() ->
> Windowed.end())
> 
> Am I really confused about how this works?
> 



signature.asc
Description: OpenPGP digital signature


Understanding windowed aggregations

2016-12-05 Thread Jon Yeargers
Im creating aggregated values as follows:

kStream.groupByKey.aggregate( ... ,TimeWindows.of(20 * 60 *
1000L).advanceBy(60 * 1000L), ...);

As I process each aggregate Im storing the current system clock time in the
aggregated record.

Im watching the aggregates come through with a subsequent '.forEach()'.

My assumption would be that an aggregate would occur when the time for a
new value falls between the start and end of a given window. Instead Im
seeing all values happen outside the expected range (Windowed.start() ->
Windowed.end())

Am I really confused about how this works?


Re: Attempting to put a clean entry for key [...] into NamedCache [...] when it already contains a dirty entry for the same key

2016-12-05 Thread Mathieu Fenniak
Hi Damian,

Yes... I can see how most of the stack trace is rather meaningless.
Unfortunately I don't have a minimal test case, and I don't want to burden
you by dumping the entire application.  (I could share it privately, if
you'd like.)

Based upon the stack trace, the relevant pieces involved are a multi-table
join (KTable [leftJoin KTable]*20) that accumulates pieces of data into a
map; one of the joined tables is an aggregation (KTable -> filter ->
groupBy -> aggregate "TimesheetNonBillableHours") that would have the
affected cache.

Mathieu


On Mon, Dec 5, 2016 at 8:36 AM, Damian Guy  wrote:

> Hi Mathieu,
>
> I'm trying to make sense of the rather long stack trace in the gist you
> provided. Can you possibly share your streams topology with us?
>
> Thanks,
> Damian
>
> On Mon, 5 Dec 2016 at 14:14 Mathieu Fenniak 
> wrote:
>
> > Hi Eno,
> >
> > This exception occurred w/ trunk @ e43bbce (current as-of Saturday).  I
> was
> > bit by KAFKA-4311 (I believe) when trying to upgrade to 0.10.1.0, so with
> > that issue now resolved I thought I'd check trunk out to see if any other
> > issues remain.
> >
> > Mathieu
> >
> >
> > On Sun, Dec 4, 2016 at 12:37 AM, Eno Thereska 
> > wrote:
> >
> > > Hi Mathieu,
> > >
> > > What version of Kafka are you using? There was recently a fix that went
> > > into trunk, just checking if you're using an older version.
> > > (to make forward progress you can turn the cache off, like this:
> > > streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_
> BUFFERING_CONFIG,
> > > 0);
> > > )
> > >
> > > Thanks
> > > Eno
> > > > On 4 Dec 2016, at 03:47, Mathieu Fenniak <
> mathieu.fenn...@replicon.com
> > >
> > > wrote:
> > > >
> > > > Hey all,
> > > >
> > > > I've just been running a quick test of my kafka-streams application
> on
> > > the
> > > > latest Kafka trunk (@e43bbce), and came across this error.  I was
> > > wondering
> > > > if anyone has seen this error before, have any thoughts on what might
> > > cause
> > > > it, or can suggest a direction to investigate it further.
> > > >
> > > > Full exception:
> > > > https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> > > >
> > > > java.lang.IllegalStateException: Attempting to put a clean entry for
> > key
> > > > [urn:replicon-tenant:strprc971e3ca9:timesheet:
> 97c0ce25-e039-4e8b-9f2c-
> > > d43f0668b755]
> > > > into NamedCache [0_0-TimesheetNonBillableHours] when it already
> > > contains a
> > > > dirty entry for the same key
> > > > at
> > > > org.apache.kafka.streams.state.internals.NamedCache.
> > > put(NamedCache.java:124)
> > > > at
> > > > org.apache.kafka.streams.state.internals.ThreadCache.
> > > put(ThreadCache.java:120)
> > > > at
> > > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > > CachingKeyValueStore.java:146)
> > > > at
> > > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > > CachingKeyValueStore.java:133)
> > > > at
> > > > org.apache.kafka.streams.kstream.internals.KTableAggregate$
> > > KTableAggregateValueGetter.get(KTableAggregate.java:128)
> > > > at
> > > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:81)
> > > > at
> > > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:54)
> > > > at
> > > > org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> > > ProcessorNode.java:82)
> > > > ... more ...
> > >
> > >
> >
>


Re: Looking for guidance on setting ZK session timeouts in AWS

2016-12-05 Thread Radek Gruchalski
Thomas,

I’m always running ZK separate from Kafka. Mind you, no multi-region, just
multi-AZ.
I have never had issues with default settings. It’s possible that once your
cluster gets bigger, you may have to increase the timeouts. Never had a
problem with cluster size of ~20 brokers.
Happy to hear from others though.

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com


On December 5, 2016 at 5:27:05 PM, Thomas Becker (tobec...@tivo.com) wrote:

Thanks for the reply, Radek. So you're running with 6s then? I'm
surprised, I thought people were generally increasing this value when
running in EC2. Can I ask if you folks are running ZK on the same
instances as your Kafka brokers? We do, and yes we know it's somewhat
frowned upon.

-Tommy
On Mon, 2016-12-05 at 11:00 -0500, Radek Gruchalski wrote:
> Hi Thomas,
>
> Defaults are good for sure. Never had a problem with default timeouts
> in AWS.
> –
> Best regards,
> Radek Gruchalski
> ra...@gruchalski.com
>
>
> On December 5, 2016 at 4:58:41 PM, Thomas Becker (tobec...@tivo.com)
> wrote:
> > I know several folks are running Kafka in AWS, can someone give me
> > an
> > idea of what sort of values you're using for ZK session timeouts?
> >
> > --
> >
> >
> > Tommy Becker
> >
> > Senior Software Engineer
> >
> > O +1 919.460.4747
> >
> > tivo.com
> >
> >
> > 
> >
> > This email and any attachments may contain confidential and
> > privileged material for the sole use of the intended recipient. Any
> > review, copying, or distribution of this email (or any attachments)
> > by others is prohibited. If you are not the intended recipient,
> > please contact the sender immediately and permanently delete this
> > email and any attachments. No employee or agent of TiVo Inc. is
> > authorized to conclude any binding agreement on behalf of TiVo Inc.
> > by email. Binding agreements with TiVo Inc. may only be made by a
> > signed written agreement.
-- 


Tommy Becker

Senior Software Engineer

O +1 919.460.4747

tivo.com




This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review, copying,
or distribution of this email (or any attachments) by others is prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.


Re: Looking for guidance on setting ZK session timeouts in AWS

2016-12-05 Thread Thomas Becker
Thanks for the reply, Radek. So you're running with 6s then?  I'm
surprised, I thought people were generally increasing this value when
running in EC2. Can I ask if you folks are running ZK on the same
instances as your Kafka brokers? We do, and yes we know it's somewhat
frowned upon.

-Tommy
On Mon, 2016-12-05 at 11:00 -0500, Radek Gruchalski wrote:
> Hi Thomas,
>
> Defaults are good for sure. Never had a problem with default timeouts
> in AWS.
> –
> Best regards,
> Radek Gruchalski
> ra...@gruchalski.com
>
>
> On December 5, 2016 at 4:58:41 PM, Thomas Becker (tobec...@tivo.com)
> wrote:
> > I know several folks are running Kafka in AWS, can someone give me
> > an
> > idea of what sort of values you're using for ZK session timeouts?
> >
> > --
> >
> >
> > Tommy Becker
> >
> > Senior Software Engineer
> >
> > O +1 919.460.4747
> >
> > tivo.com
> >
> >
> > 
> >
> > This email and any attachments may contain confidential and
> > privileged material for the sole use of the intended recipient. Any
> > review, copying, or distribution of this email (or any attachments)
> > by others is prohibited. If you are not the intended recipient,
> > please contact the sender immediately and permanently delete this
> > email and any attachments. No employee or agent of TiVo Inc. is
> > authorized to conclude any binding agreement on behalf of TiVo Inc.
> > by email. Binding agreements with TiVo Inc. may only be made by a
> > signed written agreement.
--


Tommy Becker

Senior Software Engineer

O +1 919.460.4747

tivo.com




This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: Looking for guidance on setting ZK session timeouts in AWS

2016-12-05 Thread Radek Gruchalski
Hi Thomas,

Defaults are good for sure. Never had a problem with default timeouts in
AWS.

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com


On December 5, 2016 at 4:58:41 PM, Thomas Becker (tobec...@tivo.com) wrote:

I know several folks are running Kafka in AWS, can someone give me an
idea of what sort of values you're using for ZK session timeouts?

-- 


Tommy Becker

Senior Software Engineer

O +1 919.460.4747

tivo.com




This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review, copying,
or distribution of this email (or any attachments) by others is prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.


Looking for guidance on setting ZK session timeouts in AWS

2016-12-05 Thread Thomas Becker
I know several folks are running Kafka in AWS, can someone give me an
idea of what sort of values you're using for ZK session timeouts?

--


Tommy Becker

Senior Software Engineer

O +1 919.460.4747

tivo.com




This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: Attempting to put a clean entry for key [...] into NamedCache [...] when it already contains a dirty entry for the same key

2016-12-05 Thread Damian Guy
Hi Mathieu,

I'm trying to make sense of the rather long stack trace in the gist you
provided. Can you possibly share your streams topology with us?

Thanks,
Damian

On Mon, 5 Dec 2016 at 14:14 Mathieu Fenniak 
wrote:

> Hi Eno,
>
> This exception occurred w/ trunk @ e43bbce (current as-of Saturday).  I was
> bit by KAFKA-4311 (I believe) when trying to upgrade to 0.10.1.0, so with
> that issue now resolved I thought I'd check trunk out to see if any other
> issues remain.
>
> Mathieu
>
>
> On Sun, Dec 4, 2016 at 12:37 AM, Eno Thereska 
> wrote:
>
> > Hi Mathieu,
> >
> > What version of Kafka are you using? There was recently a fix that went
> > into trunk, just checking if you're using an older version.
> > (to make forward progress you can turn the cache off, like this:
> > streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
> > 0);
> > )
> >
> > Thanks
> > Eno
> > > On 4 Dec 2016, at 03:47, Mathieu Fenniak  >
> > wrote:
> > >
> > > Hey all,
> > >
> > > I've just been running a quick test of my kafka-streams application on
> > the
> > > latest Kafka trunk (@e43bbce), and came across this error.  I was
> > wondering
> > > if anyone has seen this error before, have any thoughts on what might
> > cause
> > > it, or can suggest a direction to investigate it further.
> > >
> > > Full exception:
> > > https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> > >
> > > java.lang.IllegalStateException: Attempting to put a clean entry for
> key
> > > [urn:replicon-tenant:strprc971e3ca9:timesheet:97c0ce25-e039-4e8b-9f2c-
> > d43f0668b755]
> > > into NamedCache [0_0-TimesheetNonBillableHours] when it already
> > contains a
> > > dirty entry for the same key
> > > at
> > > org.apache.kafka.streams.state.internals.NamedCache.
> > put(NamedCache.java:124)
> > > at
> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > put(ThreadCache.java:120)
> > > at
> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > CachingKeyValueStore.java:146)
> > > at
> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > CachingKeyValueStore.java:133)
> > > at
> > > org.apache.kafka.streams.kstream.internals.KTableAggregate$
> > KTableAggregateValueGetter.get(KTableAggregate.java:128)
> > > at
> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:81)
> > > at
> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:54)
> > > at
> > > org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> > ProcessorNode.java:82)
> > > ... more ...
> >
> >
>


Re: lag monitoring

2016-12-05 Thread Jon Yeargers
FWIW - solved this by calling '.poll()'  with 'enable.auto.commit' set to
false.

On Mon, Dec 5, 2016 at 5:53 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hi Jon,
>
> Here are some lag monitoring options that are external to the consumer
> application itself; I don't know if these will be appropriate for you.  You
> can use a command-line tool like kafka-consumer-groups.sh to monitor
> consumer group lag externally (
> http://kafka.apache.org/documentation.html#basic_ops_consumer_group), you
> can use Kafka JMX metrics to monitor lag (
> http://kafka.apache.org/documentation.html#monitoring), or you can use an
> external tool like Burrow (https://github.com/linkedin/Burrow).
>
> Mathieu
>
>
>
> On Mon, Dec 5, 2016 at 4:47 AM, Jon Yeargers 
> wrote:
>
> > Is there a way to get updated consumer position(s) without subscribing
> to a
> > topic? I can achieve this by continually closing / reopening a
> > KafkaConsumer object but this is problematic as it often times out.
> >
> > Im getting consumer lag from a combination of
> >
> > (start) .seekToEnd() (and then) .position()
> >
> > and
> >
> > (end) .committed()
> >
> > but if I wait and call this set of functions again I get the same values.
> >
> > How do I refresh this connection without interfering with the actual
> > consumer group position(s)?
> >
>


Re: Attempting to put a clean entry for key [...] into NamedCache [...] when it already contains a dirty entry for the same key

2016-12-05 Thread Mathieu Fenniak
Hi Eno,

This exception occurred w/ trunk @ e43bbce (current as-of Saturday).  I was
bit by KAFKA-4311 (I believe) when trying to upgrade to 0.10.1.0, so with
that issue now resolved I thought I'd check trunk out to see if any other
issues remain.

Mathieu


On Sun, Dec 4, 2016 at 12:37 AM, Eno Thereska 
wrote:

> Hi Mathieu,
>
> What version of Kafka are you using? There was recently a fix that went
> into trunk, just checking if you're using an older version.
> (to make forward progress you can turn the cache off, like this:
> streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,
> 0);
> )
>
> Thanks
> Eno
> > On 4 Dec 2016, at 03:47, Mathieu Fenniak 
> wrote:
> >
> > Hey all,
> >
> > I've just been running a quick test of my kafka-streams application on
> the
> > latest Kafka trunk (@e43bbce), and came across this error.  I was
> wondering
> > if anyone has seen this error before, have any thoughts on what might
> cause
> > it, or can suggest a direction to investigate it further.
> >
> > Full exception:
> > https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> >
> > java.lang.IllegalStateException: Attempting to put a clean entry for key
> > [urn:replicon-tenant:strprc971e3ca9:timesheet:97c0ce25-e039-4e8b-9f2c-
> d43f0668b755]
> > into NamedCache [0_0-TimesheetNonBillableHours] when it already
> contains a
> > dirty entry for the same key
> > at
> > org.apache.kafka.streams.state.internals.NamedCache.
> put(NamedCache.java:124)
> > at
> > org.apache.kafka.streams.state.internals.ThreadCache.
> put(ThreadCache.java:120)
> > at
> > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> CachingKeyValueStore.java:146)
> > at
> > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> CachingKeyValueStore.java:133)
> > at
> > org.apache.kafka.streams.kstream.internals.KTableAggregate$
> KTableAggregateValueGetter.get(KTableAggregate.java:128)
> > at
> > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:81)
> > at
> > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:54)
> > at
> > org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> ProcessorNode.java:82)
> > ... more ...
>
>


I am getting NotLeaderForPartitionException exception on internal changelog table topcis

2016-12-05 Thread Sachin Mittal
Hi,
In my application I have replicated internal changelog topics.

>From time to time I get this exception and I am not able to figure out why.


[2016-12-05 11:05:10,635] ERROR Error sending record to topic
test-stream-key-table-changelog
(org.apache.kafka.streams.processor.internals.RecordCollector)
org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
[2016-12-05 11:05:10,755] ERROR Error sending record to topic
test-stream-key-table-changelog
(org.apache.kafka.streams.processor.internals.RecordCollector)


Please let me know what could be causing this.

Thanks

Sachin


Re: lag monitoring

2016-12-05 Thread Mathieu Fenniak
Hi Jon,

Here are some lag monitoring options that are external to the consumer
application itself; I don't know if these will be appropriate for you.  You
can use a command-line tool like kafka-consumer-groups.sh to monitor
consumer group lag externally (
http://kafka.apache.org/documentation.html#basic_ops_consumer_group), you
can use Kafka JMX metrics to monitor lag (
http://kafka.apache.org/documentation.html#monitoring), or you can use an
external tool like Burrow (https://github.com/linkedin/Burrow).

Mathieu



On Mon, Dec 5, 2016 at 4:47 AM, Jon Yeargers 
wrote:

> Is there a way to get updated consumer position(s) without subscribing to a
> topic? I can achieve this by continually closing / reopening a
> KafkaConsumer object but this is problematic as it often times out.
>
> Im getting consumer lag from a combination of
>
> (start) .seekToEnd() (and then) .position()
>
> and
>
> (end) .committed()
>
> but if I wait and call this set of functions again I get the same values.
>
> How do I refresh this connection without interfering with the actual
> consumer group position(s)?
>


lag monitoring

2016-12-05 Thread Jon Yeargers
Is there a way to get updated consumer position(s) without subscribing to a
topic? I can achieve this by continually closing / reopening a
KafkaConsumer object but this is problematic as it often times out.

Im getting consumer lag from a combination of

(start) .seekToEnd() (and then) .position()

and

(end) .committed()

but if I wait and call this set of functions again I get the same values.

How do I refresh this connection without interfering with the actual
consumer group position(s)?


When using mirrormaker, how are people creating topics?

2016-12-05 Thread James Cheng
Hi,

We are using mirrormaker to mirror topics from one cluster to another, and I 
wanted to get some advice from the community on how people are doing mirroring. 
In particular, how are people dealing with topic creation?

Do you turn on auto-topic creation in your destination clusters 
(auto.create.topics.enable=true)?

If not, do you manually create the individual destination topics?

If so, how does that work with mirroring based on a whitelist (regex)?

The way we are doing it right now is, we have our regex in a file somewhere. 
The regex is used in 2 ways:
1) Passed to mirrormaker, to do the mirroring.
2) Passed to a program which looks up all the topics on the source cluster, 
finds the ones that match the regex, and then creates them on the destination 
cluster. (We have auto-topic creation turned off 
auto.create.topics.enable=false)

One downside of that approach is there a potential race, where if the regex 
changes, then mirrormaker (in #1) might start trying to produce to a new 
destination topic before the topic was created (by #2).

Some other hand-wavy ideas that came to mind might be:
* handling topic creation in a MirrorMakerMessageHandler
* handling topic creation in an interceptor

Anyway, was hoping to get some thoughts from people who are already doing this.

Thanks!
-James