Lots of warn log in Kafka broker

2016-10-18 Thread Json Tu
Hi all,
I have a kafka 0.9.0.0 cluster with 11 nodes.
First,I found server logs as below,
server.log.2016-10-17-22:[2016-10-17 22:22:13,885] WARN 
[ReplicaFetcherThread-0-4], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@367c9f98. Possible cause: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading array of size 1786735, only 2389 bytes available 
(kafka.server.ReplicaFetcherThread)
server.log.2016-10-17-22:[2016-10-17 22:22:15,456] WARN 
[ReplicaFetcherThread-0-5], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@12088f91. Possible cause: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading array of size 1338722, only 5662 bytes available 
(kafka.server.ReplicaFetcherThread)
server.log.2016-10-17-22:[2016-10-17 22:22:15,888] WARN 
[ReplicaFetcherThread-0-4], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@60069db2. Possible cause: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading array of size 1786735, only 2389 bytes available 
(kafka.server.ReplicaFetcherThread)
server.log.2016-10-17-22:[2016-10-17 22:22:17,460] WARN 
[ReplicaFetcherThread-0-5], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@4a5991cb. Possible cause: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading array of size 1338722, only 5662 bytes available 
(kafka.server.ReplicaFetcherThread)

Then I jstack pid,and I see
"ReplicaFetcherThread-0-3" prio=10 tid=0x7f1254319800 nid=0xfdb runnable 
[0x7f0ee36d7000]
"ReplicaFetcherThread-0-8" prio=10 tid=0x7f1278141800 nid=0x66f runnable 
[0x7f0ee2ecf000]
"ReplicaFetcherThread-0-9" prio=10 tid=0x7f1278127000 nid=0x66e runnable 
[0x7f0ee2fd]
"ReplicaFetcherThread-0-4" prio=10 tid=0x7f127810c800 nid=0x66d waiting on 
condition [0x7f0ee30d1000]
"ReplicaFetcherThread-0-1" prio=10 tid=0x7f12780ef800 nid=0x66c runnable 
[0x7f0ee31d2000]
"ReplicaFetcherThread-0-7" prio=10 tid=0x7f12780d4800 nid=0x66b runnable 
[0x7f0ee32d3000]
"ReplicaFetcherThread-0-5" prio=10 tid=0x7f12780b9800 nid=0x66a waiting on 
condition [0x7f0ee33d4000]
"ReplicaFetcherThread-0-6" prio=10 tid=0x7f127809f000 nid=0x669 runnable 
[0x7f0ee34d5000]
"ReplicaFetcherThread-0-2" prio=10 tid=0x7f1278084800 nid=0x668 runnable 
[0x7f0ee35d6000]
"ReplicaFetcherThread-0-10" prio=10 tid=0x7f127804c800 nid=0x666 runnable 
[0x7f0ee37d8000]

 the log shows that there are 2 replicaFetcherThreads waiting on condition.
 my cluster have no broker version compatible problem. From log, I thought 
there are some exceptions in broker 4 and broker 5, so I restart then, and 
everything goes right.

 what does this log means and how can it occurs?
 
 Will appreciate if anyone has any insight on what's happening here.
 Thanks.



Re: client use high cpu which caused by delayedFetch operation immediately return

2016-10-18 Thread Json Tu
Thanks. I patch it, and everything goes ok.
> 在 2016年10月9日,下午12:39,Becket Qin  写道:
> 
> Can you check if you have KAFKA-3003 when you run the code?
> 
> On Sat, Oct 8, 2016 at 12:52 AM, Kafka  wrote:
> 
>> Hi all,
>>we found our consumer have high cpu load in our product
>> enviroment,as we know,fetch.min.bytes and fetch.wait.ma <
>> http://fetch.wait.ma/>x.ms will affect the frequency of consumer’s return,
>> so we adjust them to very big so that broker is very hard to satisfy it.
>>then we found the problem is not be solved,then we check the
>> kafka’s code,we check delayedFetch’s tryComplete() function has these codes,
>> 
>> if (endOffset.messageOffset != fetchOffset.messageOffset) {
>>  if (endOffset.onOlderSegment(fetchOffset)) {
>>// Case C, this can happen when the new fetch operation is
>> on a truncated leader
>>debug("Satisfying fetch %s since it is fetching later
>> segments of partition %s.".format(fetchMetadata, topicAndPartition))
>>return forceComplete()
>>  } else if (fetchOffset.onOlderSegment(endOffset)) {
>>// Case C, this can happen when the fetch operation is
>> falling behind the current segment
>>// or the partition has just rolled a new segment
>>debug("Satisfying fetch %s immediately since it is
>> fetching older segments.".format(fetchMetadata))
>>return forceComplete()
>>  } else if (fetchOffset.messageOffset <
>> endOffset.messageOffset) {
>>// we need take the partition fetch size as upper bound
>> when accumulating the bytes
>>accumulatedSize += 
>> math.min(endOffset.positionDiff(fetchOffset),
>> fetchStatus.fetchInfo.fetchSize)
>>  }
>>}
>> 
>> so we can ensure that our fetchOffset’s segmentBaseOffset is not the same
>> as endOffset’s segmentBaseOffset,then we check our topic-partition’s
>> segment, we found the data in the segment is all cleaned by the kafka for
>> log.retention.
>> and we guess that the  fetchOffset’s segmentBaseOffset is smaller than
>> endOffset’s segmentBaseOffset leads this problem.
>> 
>> but my point is should we use we use these code to make client use less
>> cpu,
>>   if (endOffset.messageOffset != fetchOffset.messageOffset) {
>>  if (endOffset.onOlderSegment(fetchOffset)) {
>>return false
>>  } else if (fetchOffset.onOlderSegment(endOffset)) {
>>return false
>>  }
>>}
>> 
>> and then it will response after fetch.wait.ma x.ms
>> in this scene instead of immediately return.
>> 
>> Feedback is greatly appreciated. Thanks.
>> 
>> 
>> 
>> 




Re: client use high cpu which caused by delayedFetch operation immediately return

2016-10-18 Thread Becket Qin
Glad to know :)

On Tue, Oct 18, 2016 at 1:24 AM, Json Tu  wrote:

> Thanks. I patch it, and everything goes ok.
> > 在 2016年10月9日,下午12:39,Becket Qin  写道:
> >
> > Can you check if you have KAFKA-3003 when you run the code?
> >
> > On Sat, Oct 8, 2016 at 12:52 AM, Kafka  wrote:
> >
> >> Hi all,
> >>we found our consumer have high cpu load in our product
> >> enviroment,as we know,fetch.min.bytes and fetch.wait.ma <
> >> http://fetch.wait.ma/>x.ms will affect the frequency of consumer’s
> return,
> >> so we adjust them to very big so that broker is very hard to satisfy it.
> >>then we found the problem is not be solved,then we check the
> >> kafka’s code,we check delayedFetch’s tryComplete() function has these
> codes,
> >>
> >> if (endOffset.messageOffset != fetchOffset.messageOffset) {
> >>  if (endOffset.onOlderSegment(fetchOffset)) {
> >>// Case C, this can happen when the new fetch operation
> is
> >> on a truncated leader
> >>debug("Satisfying fetch %s since it is fetching later
> >> segments of partition %s.".format(fetchMetadata, topicAndPartition))
> >>return forceComplete()
> >>  } else if (fetchOffset.onOlderSegment(endOffset)) {
> >>// Case C, this can happen when the fetch operation is
> >> falling behind the current segment
> >>// or the partition has just rolled a new segment
> >>debug("Satisfying fetch %s immediately since it is
> >> fetching older segments.".format(fetchMetadata))
> >>return forceComplete()
> >>  } else if (fetchOffset.messageOffset <
> >> endOffset.messageOffset) {
> >>// we need take the partition fetch size as upper bound
> >> when accumulating the bytes
> >>accumulatedSize += math.min(endOffset.
> positionDiff(fetchOffset),
> >> fetchStatus.fetchInfo.fetchSize)
> >>  }
> >>}
> >>
> >> so we can ensure that our fetchOffset’s segmentBaseOffset is not the
> same
> >> as endOffset’s segmentBaseOffset,then we check our topic-partition’s
> >> segment, we found the data in the segment is all cleaned by the kafka
> for
> >> log.retention.
> >> and we guess that the  fetchOffset’s segmentBaseOffset is smaller than
> >> endOffset’s segmentBaseOffset leads this problem.
> >>
> >> but my point is should we use we use these code to make client use less
> >> cpu,
> >>   if (endOffset.messageOffset != fetchOffset.messageOffset) {
> >>  if (endOffset.onOlderSegment(fetchOffset)) {
> >>return false
> >>  } else if (fetchOffset.onOlderSegment(endOffset)) {
> >>return false
> >>  }
> >>}
> >>
> >> and then it will response after fetch.wait.ma 
> x.ms
> >> in this scene instead of immediately return.
> >>
> >> Feedback is greatly appreciated. Thanks.
> >>
> >>
> >>
> >>
>
>
>


Re: Occasional NPE in NamedCache

2016-10-18 Thread Frank Lyaruu
I might have run into a related problem:

[StreamThread-1] ERROR
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
[StreamThread-1] Failed to close state manager for StreamTask 0_0:
org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
to close state store addr-organization

at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
at
org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
at
org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
at
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
at
org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
entry is null

at
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
at
org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)

... 7 more

I haven't done much research and it is quite possible there is a bug on my
side, but I don't think I should be seeing this.



On Thu, Oct 13, 2016 at 10:18 PM, Guozhang Wang  wrote:

> BTW this is tracked and resolved as
> https://issues.apache.org/jira/browse/KAFKA-4300.
>
> On Thu, Oct 13, 2016 at 1:17 PM, Guozhang Wang  wrote:
>
> > Thanks Frank for reporting the bug, and many thanks to Damian for the
> > quick catch!
> >
> > On Thu, Oct 13, 2016 at 12:30 PM, Frank Lyaruu 
> wrote:
> >
> >> The issue seems to be gone. Amazing work, thanks...!
> >>
> >> On Thu, Oct 13, 2016 at 6:56 PM, Damian Guy 
> wrote:
> >>
> >> > Hi, i believe i found the problem. If possible could you please try
> with
> >> > this: https://github.com/dguy/kafka/tree/cache-bug
> >> >
> >> > Thanks,
> >> > Damian
> >> >
> >> > On Thu, 13 Oct 2016 at 17:46 Damian Guy  wrote:
> >> >
> >> > > Hi Frank,
> >> > >
> >> > > Thanks for reporting. Can you provide a sample of the join you are
> >> > > running?
> >> > >
> >> > > Thanks,
> >> > > Damian
> >> > >
> >> > > On Thu, 13 Oct 2016 at 16:10 Frank Lyaruu 
> wrote:
> >> > >
> >> > > Hi Kafka people,
> >> > >
> >> > > I'm joining a bunch of Kafka Topics using Kafka Streams, with the
> >> Kafka
> >> > > 0.10.1 release candidate.
> >> > >
> >> > > It runs ok for a few thousand of messages, and then it dies with the
> >> > > following exception:
> >> > >
> >> > > Exception in thread "StreamThread-1" java.lang.NullPointerException
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.state.internals.NamedCache.
> >> > evict(NamedCache.java:194)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> >> > maybeEvict(ThreadCache.java:190)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> >> > put(ThreadCache.java:121)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> >> > CachingKeyValueStore.java:147)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> >> > CachingKeyValueStore.java:134)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.kstream.internals.KTableReduce$
> >> > KTableAggregateValueGetter.get(KTableReduce.java:121)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> >> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:77)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> >> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:48)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> >> > ProcessorNode.java:82)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.processor.internals.
> >> > ProcessorContextImpl.forward(ProcessorContextImpl.java:204)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.kstream.internals.KTableFilter$
> >> > KTableFilterProcessor.process(KTableFilter.java:83)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.kstream.internals.KTableFilter$
> >> > KTableFilterProcessor.process(KTableFilter.java:73)
> >> > > at
> >> > >
> >> > > org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> >> > ProcessorNode.java:82)
> >> > > at
> >> > >
>

Kafka Streams Aggregate By Date

2016-10-18 Thread Furkan KAMACI
Hi,

I could successfully run Kafka at my environment. I want to monitor Queries
per Second at my search application with Kafka. Whenever a search request
is done I create a ProducerRecord which holds current nano time of the
system.

I know that I have to use a streaming API for calculation i.e. Kafka
Streams or Spark Streams. My choice is to use Kafka Streams.

For last 1 hours, or since the beginning, I have to calculate the queries
per second. How can I make such an aggregation at Kafka Streams?

Kind Regards,
Furkan KAMACI


Embedded Kafka Cluster - Maven artifact?

2016-10-18 Thread Ali Akhtar
Is there a maven artifact that can be used to create instances
of EmbeddedSingleNodeKafkaCluster for unit / integration tests?


Re: Occasional NPE in NamedCache

2016-10-18 Thread Damian Guy
Hi Frank,

Are you able to reproduce this? I'll have a look into it, but it is not
immediately clear how it could get into this state.

Thanks,
Damian


On Tue, 18 Oct 2016 at 11:08 Frank Lyaruu  wrote:

> I might have run into a related problem:
>
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
>
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
>
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
>
> at
>
> org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
> at
>
> org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
> at
>
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
> at
>
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)
>
> ... 7 more
>
> I haven't done much research and it is quite possible there is a bug on my
> side, but I don't think I should be seeing this.
>
>
>
> On Thu, Oct 13, 2016 at 10:18 PM, Guozhang Wang 
> wrote:
>
> > BTW this is tracked and resolved as
> > https://issues.apache.org/jira/browse/KAFKA-4300.
> >
> > On Thu, Oct 13, 2016 at 1:17 PM, Guozhang Wang 
> wrote:
> >
> > > Thanks Frank for reporting the bug, and many thanks to Damian for the
> > > quick catch!
> > >
> > > On Thu, Oct 13, 2016 at 12:30 PM, Frank Lyaruu 
> > wrote:
> > >
> > >> The issue seems to be gone. Amazing work, thanks...!
> > >>
> > >> On Thu, Oct 13, 2016 at 6:56 PM, Damian Guy 
> > wrote:
> > >>
> > >> > Hi, i believe i found the problem. If possible could you please try
> > with
> > >> > this: https://github.com/dguy/kafka/tree/cache-bug
> > >> >
> > >> > Thanks,
> > >> > Damian
> > >> >
> > >> > On Thu, 13 Oct 2016 at 17:46 Damian Guy 
> wrote:
> > >> >
> > >> > > Hi Frank,
> > >> > >
> > >> > > Thanks for reporting. Can you provide a sample of the join you are
> > >> > > running?
> > >> > >
> > >> > > Thanks,
> > >> > > Damian
> > >> > >
> > >> > > On Thu, 13 Oct 2016 at 16:10 Frank Lyaruu 
> > wrote:
> > >> > >
> > >> > > Hi Kafka people,
> > >> > >
> > >> > > I'm joining a bunch of Kafka Topics using Kafka Streams, with the
> > >> Kafka
> > >> > > 0.10.1 release candidate.
> > >> > >
> > >> > > It runs ok for a few thousand of messages, and then it dies with
> the
> > >> > > following exception:
> > >> > >
> > >> > > Exception in thread "StreamThread-1"
> java.lang.NullPointerException
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.NamedCache.
> > >> > evict(NamedCache.java:194)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > >> > maybeEvict(ThreadCache.java:190)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > >> > put(ThreadCache.java:121)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > >> > CachingKeyValueStore.java:147)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > >> > CachingKeyValueStore.java:134)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableReduce$
> > >> > KTableAggregateValueGetter.get(KTableReduce.java:121)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > >> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:77)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > >> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:48)
> > >> > > at
> > >> > >
> > >> > >
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(
> > >> > ProcessorNode.java:82)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.processor.internals.
> > >> > Pr

Re: Occasional NPE in NamedCache

2016-10-18 Thread Damian Guy
Also, it'd be great if you could share your streams topology.

Thanks,
Damian

On Tue, 18 Oct 2016 at 15:48 Damian Guy  wrote:

> Hi Frank,
>
> Are you able to reproduce this? I'll have a look into it, but it is not
> immediately clear how it could get into this state.
>
> Thanks,
> Damian
>
>
> On Tue, 18 Oct 2016 at 11:08 Frank Lyaruu  wrote:
>
> I might have run into a related problem:
>
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
>
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
>
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
>
> at
>
> org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
> at
>
> org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
> at
>
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
> at
>
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)
>
> ... 7 more
>
> I haven't done much research and it is quite possible there is a bug on my
> side, but I don't think I should be seeing this.
>
>
>
> On Thu, Oct 13, 2016 at 10:18 PM, Guozhang Wang 
> wrote:
>
> > BTW this is tracked and resolved as
> > https://issues.apache.org/jira/browse/KAFKA-4300.
> >
> > On Thu, Oct 13, 2016 at 1:17 PM, Guozhang Wang 
> wrote:
> >
> > > Thanks Frank for reporting the bug, and many thanks to Damian for the
> > > quick catch!
> > >
> > > On Thu, Oct 13, 2016 at 12:30 PM, Frank Lyaruu 
> > wrote:
> > >
> > >> The issue seems to be gone. Amazing work, thanks...!
> > >>
> > >> On Thu, Oct 13, 2016 at 6:56 PM, Damian Guy 
> > wrote:
> > >>
> > >> > Hi, i believe i found the problem. If possible could you please try
> > with
> > >> > this: https://github.com/dguy/kafka/tree/cache-bug
> > >> >
> > >> > Thanks,
> > >> > Damian
> > >> >
> > >> > On Thu, 13 Oct 2016 at 17:46 Damian Guy 
> wrote:
> > >> >
> > >> > > Hi Frank,
> > >> > >
> > >> > > Thanks for reporting. Can you provide a sample of the join you are
> > >> > > running?
> > >> > >
> > >> > > Thanks,
> > >> > > Damian
> > >> > >
> > >> > > On Thu, 13 Oct 2016 at 16:10 Frank Lyaruu 
> > wrote:
> > >> > >
> > >> > > Hi Kafka people,
> > >> > >
> > >> > > I'm joining a bunch of Kafka Topics using Kafka Streams, with the
> > >> Kafka
> > >> > > 0.10.1 release candidate.
> > >> > >
> > >> > > It runs ok for a few thousand of messages, and then it dies with
> the
> > >> > > following exception:
> > >> > >
> > >> > > Exception in thread "StreamThread-1"
> java.lang.NullPointerException
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.NamedCache.
> > >> > evict(NamedCache.java:194)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > >> > maybeEvict(ThreadCache.java:190)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > >> > put(ThreadCache.java:121)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > >> > CachingKeyValueStore.java:147)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > >> > CachingKeyValueStore.java:134)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableReduce$
> > >> > KTableAggregateValueGetter.get(KTableReduce.java:121)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > >> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:77)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > >> > KTableKTableLeftJoinProcessor.process(KTableKTableLeftJoin.java:48)
> > >> > > at
> > >> > >
> > >> > >
> org.apache.kafka.streams.processor.internals.Pro

How to use a DNS alias name in bootstrap.servers property

2016-10-18 Thread Ojha, Ashish
Hi Team,

We are using Kafka 0.10 with Kerberos security . We have a use case where we 
want to use a DNS alias name instead of the physical hostnames in the 
"bootstrap.servers" property . Using DNS alias name is helpful from operational 
perspective ( ex : it's easy to add/remove new brokers in the cluster without 
any code change on the app side )
When we use the DNS alias name , the client is unable to authenticate to the 
Kafka broker .


props.put("bootstrap.servers", 
"kafka.vipTesting.test.kafka.nimbus.abc.com:");
props.put("security.protocol", "SASL_PLAINTEXT");
props.put("sasl.kerberos.service.name", "kafka");
props.put("group.id", "ashish-group");
props.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");


We get below error :

16:02:03.924 [main] DEBUG o.a.k.c.c.i.AbstractCoordinator - Sending coordinator 
request for group ashish-group to broker 
kafka.vipTesting.test.kafka.nimbus.abc.com: (id: -1 rack: null)
16:02:04.011 [main] DEBUG o.apache.kafka.clients.NetworkClient - Initiating 
connection to node -1 at kafka.vipTesting.test.kafka.nimbus.abc.com:.
16:02:04.038 [main] DEBUG o.a.k.c.s.a.SaslClientAuthenticator - Set SASL client 
state to SEND_HANDSHAKE_REQUEST
16:02:04.045 [main] DEBUG o.a.k.c.s.a.SaslClientAuthenticator - Creating 
SaslClient: 
client=kafka_ba...@xx.com;service=kafka;serviceHostname=kafka.vipTesting.test.kafka.nimbus.abc.com;mechs=[GSSAPI]
16:02:04.117 [main] DEBUG o.a.kafka.common.metrics.Metrics - Added sensor with 
name node--1.bytes-sent
16:02:04.118 [main] DEBUG o.a.kafka.common.metrics.Metrics - Added sensor with 
name node--1.bytes-received
16:02:04.121 [main] DEBUG o.a.kafka.common.metrics.Metrics - Added sensor with 
name node--1.latency
16:02:04.180 [main] DEBUG o.a.k.c.s.a.SaslClientAuthenticator - Set SASL client 
state to RECEIVE_HANDSHAKE_RESPONSE
16:02:04.180 [main] DEBUG o.apache.kafka.clients.NetworkClient - Completed 
connection to node -1
16:02:04.311 [main] DEBUG o.a.k.c.s.a.SaslClientAuthenticator - Set SASL client 
state to INITIAL
16:02:04.352 [main] DEBUG o.a.kafka.common.network.Selector - Connection with 
kafka.vipTesting.test.kafka.nimbus.abc.com/XX.YY.BB. disconnected
javax.security.sasl.SaslException: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Server not found in Kerberos database (7) - 
LOOKING_UP_SERVER)]) occurred when evaluating SASL token received from the 
Kafka Broker. Kafka Client will go to AUTH_FAILED state.
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslToken(SaslClientAuthenticator.java:293)
 ~[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.sendSaslToken(SaslClientAuthenticator.java:210)
 ~[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:178)
 ~[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:64) 
~[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:318) 
[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.common.network.Selector.poll(Selector.java:283) 
[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) 
[kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
 [kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
 [kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
 [kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
 [kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:179)
 [kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
 [kafka-clients-0.10.0.0_2.jar:na]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) 
[kafka-clients-0.10.0.0_2.jar:na]
at 
main.java.Kafka.sasl.kerberos.KafkaConsumer_Kerberos.main(KafkaConsumer_Kerberos.java:42)
 [classes/:na]
at sun.reflect.NativeMethodAccessorImpl

Re: Occasional NPE in NamedCache

2016-10-18 Thread Damian Guy
Hi Frank,

Which version of kafka are you running? The line numbers in the stack trace
don't match up with what i am seeing on 0.10.1 or on trunk.

FYI - I created a JIRA for this here:
https://issues.apache.org/jira/browse/KAFKA-4311

Thanks,
Damian

On Tue, 18 Oct 2016 at 15:52 Damian Guy  wrote:

> Also, it'd be great if you could share your streams topology.
>
> Thanks,
> Damian
>
> On Tue, 18 Oct 2016 at 15:48 Damian Guy  wrote:
>
> Hi Frank,
>
> Are you able to reproduce this? I'll have a look into it, but it is not
> immediately clear how it could get into this state.
>
> Thanks,
> Damian
>
>
> On Tue, 18 Oct 2016 at 11:08 Frank Lyaruu  wrote:
>
> I might have run into a related problem:
>
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
>
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
>
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
>
> at
>
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
>
> at
>
> org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
> at
>
> org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
> at
>
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
> at
>
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
> at
>
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)
>
> ... 7 more
>
> I haven't done much research and it is quite possible there is a bug on my
> side, but I don't think I should be seeing this.
>
>
>
> On Thu, Oct 13, 2016 at 10:18 PM, Guozhang Wang 
> wrote:
>
> > BTW this is tracked and resolved as
> > https://issues.apache.org/jira/browse/KAFKA-4300.
> >
> > On Thu, Oct 13, 2016 at 1:17 PM, Guozhang Wang 
> wrote:
> >
> > > Thanks Frank for reporting the bug, and many thanks to Damian for the
> > > quick catch!
> > >
> > > On Thu, Oct 13, 2016 at 12:30 PM, Frank Lyaruu 
> > wrote:
> > >
> > >> The issue seems to be gone. Amazing work, thanks...!
> > >>
> > >> On Thu, Oct 13, 2016 at 6:56 PM, Damian Guy 
> > wrote:
> > >>
> > >> > Hi, i believe i found the problem. If possible could you please try
> > with
> > >> > this: https://github.com/dguy/kafka/tree/cache-bug
> > >> >
> > >> > Thanks,
> > >> > Damian
> > >> >
> > >> > On Thu, 13 Oct 2016 at 17:46 Damian Guy 
> wrote:
> > >> >
> > >> > > Hi Frank,
> > >> > >
> > >> > > Thanks for reporting. Can you provide a sample of the join you are
> > >> > > running?
> > >> > >
> > >> > > Thanks,
> > >> > > Damian
> > >> > >
> > >> > > On Thu, 13 Oct 2016 at 16:10 Frank Lyaruu 
> > wrote:
> > >> > >
> > >> > > Hi Kafka people,
> > >> > >
> > >> > > I'm joining a bunch of Kafka Topics using Kafka Streams, with the
> > >> Kafka
> > >> > > 0.10.1 release candidate.
> > >> > >
> > >> > > It runs ok for a few thousand of messages, and then it dies with
> the
> > >> > > following exception:
> > >> > >
> > >> > > Exception in thread "StreamThread-1"
> java.lang.NullPointerException
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.NamedCache.
> > >> > evict(NamedCache.java:194)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > >> > maybeEvict(ThreadCache.java:190)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.ThreadCache.
> > >> > put(ThreadCache.java:121)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > >> > CachingKeyValueStore.java:147)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(
> > >> > CachingKeyValueStore.java:134)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableReduce$
> > >> > KTableAggregateValueGetter.get(KTableReduce.java:121)
> > >> > > at
> > >> > >
> > >> > > org.apache.kafka.streams.kstream.internals.KTableKTableLeftJoin$
> > >> > KTableKTableLeftJoin

Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hi,

You just need to read you stream and apply an (windowed) aggregation
on it.

If you use non-windowed aggregation you will get "since the
beginning". If you use windowed aggregation you can specify the window
size as 1 hour and get those results.

One comment: it seems that you want to count *all* queries. To make
this work, you need to make sure all records are using the same key
(because Kafka Streams only supports aggregation over keyed streams).
Keep in mind, that this prohibits parallelization of you aggregation!

As a workaround, you could also do two consecutive aggregation, and do
parallelize the first one, and do not parallelize the second one (ie,
using the first one as a pre aggregation similar to a combine step)

Without pre aggregation and assuming all records use the same key
something like this (for current trunk):


> KStreamBuilder builder = new KStreamBuilder(): KStream input =
> builder.stream("yourTopic");
> 
> KGroupedStream groupedInput = input.groupByKey();
> 
> groupedInput.count("countStore").to("outputTopicCountFromBeginning");
>
> 
groupedInput.count(TimeWindows.of(3600 * 1000),
"windowedCountStore").to("outputTopicHourlyCounts"):


For more details, please see the docs and examples:

 - http://docs.confluent.io/current/streams/index.html
 -
https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/ka
fka-streams


- -Matthias

On 10/18/16 5:00 AM, Furkan KAMACI wrote:
> Hi,
> 
> I could successfully run Kafka at my environment. I want to monitor
> Queries per Second at my search application with Kafka. Whenever a
> search request is done I create a ProducerRecord which holds
> current nano time of the system.
> 
> I know that I have to use a streaming API for calculation i.e.
> Kafka Streams or Spark Streams. My choice is to use Kafka Streams.
> 
> For last 1 hours, or since the beginning, I have to calculate the
> queries per second. How can I make such an aggregation at Kafka
> Streams?
> 
> Kind Regards, Furkan KAMACI
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYBl6wAAoJECnhiMLycopPVVAP/0EqJJsLnKqvMeIM3XmV7dzP
JnvHJdj0QUn2ONe1Fl9PEDxQvqkw0x/45fBfZsoWqMvIn5uvPfkeF0+TSLFUVUsu
6r+QV8xjJ53GTuPvBQOcUx1H7onXyPkfa88OGVMFV0Er7/1C/p6CAT/MF8x04Fjh
VqT0EQbqVWxoLXdm+GHaUEgdIsJNaXzOzBcxPL9ayA71G4UtwGUud86kjU8CvURJ
wDsZYdWa2TebqG5g80l1YPzRDbNgHKJ4ezHKxdZ+XufizGcoE48BsGzHe09RQDbZ
5aiW+rVXO9dQBIP+3FA3Yeno6+lnGmIECFiHw0FaudOVJIxm40eyTltHjmODMP6T
P55XQKvs6rVwjTp1uxcvrggXtkp+B/Wdglo5RM+MAZ/MkZXc8ruY2G4JYqn3Ko7q
1eEKDpvkbhKGDE9HJGmH0pmYXgSXYhNZPUAURy6pgbpAapysZovJJG1tvIFY2E4R
EpZPHc9JaXOdlOAsK9q468VrCx1pOakC8AZYUAm6vRiSLHGYjiT8sTHQf3IWjP4q
HPCtwk6IZGTGjdLyyMHGm2vbmtiMPBdAN/pau9pehFb5c7Np2uT8WyBL0ECgdOmb
MoxtytRsbuMchZKUo5Wa2wEaBpKwiAnGssW94e3FF898P2tV0br1lLXyrsyNnakN
qOb2YW0mz/+66AJsJw90
=X1XQ
-END PGP SIGNATURE-


Re: Stream processing meetup at LinkedIn (Sunnyvale) on Wednesday, November 2 at 6pm

2016-10-18 Thread João Reis
Hi Joel,

Would it be possible to stream the presentations ?

Cheers,
João Reis


From: Joel Koshy 
Sent: Monday, October 17, 2016 10:25:10 PM
Cc: eyakabo...@linkedin.com
Subject: Stream processing meetup at LinkedIn (Sunnyvale) on Wednesday, 
November 2 at 6pm

Hi everyone,

We would like to invite you to a Stream Processing Meetup at LinkedIn’s
Sunnyvale campus on Wednesday, November 2 at 6pm.

Please RSVP here (if you intend to attend in person):
http://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/234454163

We have the following three talks scheduled:

   - *Stream Processing using Apache Samza at LinkedIn: Past, Present, and
   Future*
  - *by Kartik Paramasivam* (LinkedIn)
   - *Kafka Cruise Control: Auto Management of the Kafka Clusters*
  - *by Becket (Jiangjie) Qin* (LinkedIn)
   - *Plumber - Intuit's Samza Ecosystem*
  - *by Shekar Tippur* (Intuit)

Hope to see you there!

Joel

__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__

__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__

Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Furkan KAMACI
Hi Matthias,

Thanks for your detailed answer. By the way I couldn't find "KGroupedStream"
at version of 0.10.0.1?

Kind Regards,
Furkan KAMACI

On Tue, Oct 18, 2016 at 8:41 PM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Hi,
>
> You just need to read you stream and apply an (windowed) aggregation
> on it.
>
> If you use non-windowed aggregation you will get "since the
> beginning". If you use windowed aggregation you can specify the window
> size as 1 hour and get those results.
>
> One comment: it seems that you want to count *all* queries. To make
> this work, you need to make sure all records are using the same key
> (because Kafka Streams only supports aggregation over keyed streams).
> Keep in mind, that this prohibits parallelization of you aggregation!
>
> As a workaround, you could also do two consecutive aggregation, and do
> parallelize the first one, and do not parallelize the second one (ie,
> using the first one as a pre aggregation similar to a combine step)
>
> Without pre aggregation and assuming all records use the same key
> something like this (for current trunk):
>
>
> > KStreamBuilder builder = new KStreamBuilder(): KStream input =
> > builder.stream("yourTopic");
> >
> > KGroupedStream groupedInput = input.groupByKey();
> >
> > groupedInput.count("countStore").to("outputTopicCountFromBeginning");
> >
> >
> groupedInput.count(TimeWindows.of(3600 * 1000),
> "windowedCountStore").to("outputTopicHourlyCounts"):
>
>
> For more details, please see the docs and examples:
>
>  - http://docs.confluent.io/current/streams/index.html
>  -
> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/ka
> fka-streams
>
>
> - -Matthias
>
> On 10/18/16 5:00 AM, Furkan KAMACI wrote:
> > Hi,
> >
> > I could successfully run Kafka at my environment. I want to monitor
> > Queries per Second at my search application with Kafka. Whenever a
> > search request is done I create a ProducerRecord which holds
> > current nano time of the system.
> >
> > I know that I have to use a streaming API for calculation i.e.
> > Kafka Streams or Spark Streams. My choice is to use Kafka Streams.
> >
> > For last 1 hours, or since the beginning, I have to calculate the
> > queries per second. How can I make such an aggregation at Kafka
> > Streams?
> >
> > Kind Regards, Furkan KAMACI
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYBl6wAAoJECnhiMLycopPVVAP/0EqJJsLnKqvMeIM3XmV7dzP
> JnvHJdj0QUn2ONe1Fl9PEDxQvqkw0x/45fBfZsoWqMvIn5uvPfkeF0+TSLFUVUsu
> 6r+QV8xjJ53GTuPvBQOcUx1H7onXyPkfa88OGVMFV0Er7/1C/p6CAT/MF8x04Fjh
> VqT0EQbqVWxoLXdm+GHaUEgdIsJNaXzOzBcxPL9ayA71G4UtwGUud86kjU8CvURJ
> wDsZYdWa2TebqG5g80l1YPzRDbNgHKJ4ezHKxdZ+XufizGcoE48BsGzHe09RQDbZ
> 5aiW+rVXO9dQBIP+3FA3Yeno6+lnGmIECFiHw0FaudOVJIxm40eyTltHjmODMP6T
> P55XQKvs6rVwjTp1uxcvrggXtkp+B/Wdglo5RM+MAZ/MkZXc8ruY2G4JYqn3Ko7q
> 1eEKDpvkbhKGDE9HJGmH0pmYXgSXYhNZPUAURy6pgbpAapysZovJJG1tvIFY2E4R
> EpZPHc9JaXOdlOAsK9q468VrCx1pOakC8AZYUAm6vRiSLHGYjiT8sTHQf3IWjP4q
> HPCtwk6IZGTGjdLyyMHGm2vbmtiMPBdAN/pau9pehFb5c7Np2uT8WyBL0ECgdOmb
> MoxtytRsbuMchZKUo5Wa2wEaBpKwiAnGssW94e3FF898P2tV0br1lLXyrsyNnakN
> qOb2YW0mz/+66AJsJw90
> =X1XQ
> -END PGP SIGNATURE-
>


Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

I see. KGroupedStream will be part of 0.10.1.0 (should be release the
next weeks).

So, instead of

> .groupByKey().count()

you need to do

> .countByKey()



- -Matthias

On 10/18/16 12:05 PM, Furkan KAMACI wrote:
> Hi Matthias,
> 
> Thanks for your detailed answer. By the way I couldn't find
> "KGroupedStream" at version of 0.10.0.1?
> 
> Kind Regards, Furkan KAMACI
> 
> On Tue, Oct 18, 2016 at 8:41 PM, Matthias J. Sax
>  wrote:
> 
> Hi,
> 
> You just need to read you stream and apply an (windowed)
> aggregation on it.
> 
> If you use non-windowed aggregation you will get "since the 
> beginning". If you use windowed aggregation you can specify the
> window size as 1 hour and get those results.
> 
> One comment: it seems that you want to count *all* queries. To
> make this work, you need to make sure all records are using the
> same key (because Kafka Streams only supports aggregation over
> keyed streams). Keep in mind, that this prohibits parallelization
> of you aggregation!
> 
> As a workaround, you could also do two consecutive aggregation, and
> do parallelize the first one, and do not parallelize the second one
> (ie, using the first one as a pre aggregation similar to a combine
> step)
> 
> Without pre aggregation and assuming all records use the same key 
> something like this (for current trunk):
> 
> 
 KStreamBuilder builder = new KStreamBuilder(): KStream input
 = builder.stream("yourTopic");
 
 KGroupedStream groupedInput = input.groupByKey();
 
 groupedInput.count("countStore").to("outputTopicCountFromBeginning"
);


>
 
groupedInput.count(TimeWindows.of(3600 * 1000),
> "windowedCountStore").to("outputTopicHourlyCounts"):
> 
> 
> For more details, please see the docs and examples:
> 
> - http://docs.confluent.io/current/streams/index.html - 
> https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/
ka
>
> 
fka-streams
> 
> 
> -Matthias
> 
> On 10/18/16 5:00 AM, Furkan KAMACI wrote:
 Hi,
 
 I could successfully run Kafka at my environment. I want to
 monitor Queries per Second at my search application with
 Kafka. Whenever a search request is done I create a
 ProducerRecord which holds current nano time of the system.
 
 I know that I have to use a streaming API for calculation
 i.e. Kafka Streams or Spark Streams. My choice is to use
 Kafka Streams.
 
 For last 1 hours, or since the beginning, I have to calculate
 the queries per second. How can I make such an aggregation at
 Kafka Streams?
 
 Kind Regards, Furkan KAMACI
 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYBnlRAAoJECnhiMLycopPF+cQAKWt58HvcEebqXC+KlSc5M8c
rcxqTbkH3YT9SEm0zLinoXWaJyd/EHUkaWSStiNekZgRe9BsXBHjFnhy/Pg20D0A
JYKBA0IK4DTBy6sJvu1Wyd08iQ85HTmFlMZDg38EkTJOkp8SnPhQ4O2/IKyudWFD
kLBBJBLSEEdFcP+HWnP469rcfVBcr7kE+bgPxAPTLH0/v0G7+RAwwxi/wfV+c/TB
kvGkn+sYgRtyduUS62wVUTC4tOYAuooqn6/Aiwdu+e/a4+S0DsSoQQi0Oyts+gd9
6/aDLPnGrHT1kUMNbGIqOqLLw2rxs3NtQXFB3odjgt+rHtEuItqohgkV5SCjut3Y
Uv89xQOKrx9TgtTUTcra3ckwffVFNsFa+DGuZbMvm2P2hC1k/7yCZGa+0l6vRauk
wQ5dw0Ug/DGWHYFSIBuDz81mDsmgmpLh/QXIcqIJ3rQ1VgDbfopwQhuuaQiaEPDF
p9S524sy3EYMVGqzdWOFC2+7MVYrnWK6CEkxpAvOGqJw951eAObM9OFmiN1o0wJ4
Kkif20adZRY6HANFyurEkPHs2id/JVh/LVkV6DO/DAtqun4rFesuC3m8bUyOlBjq
UbHmDnq40X6uohvfiurO4NGmOfLBEm6GQPxTyNFgEUCBrORjsgXaY7bpzsxUNvvc
u+554Ztge1RtJCjbbtR1
=z4/M
-END PGP SIGNATURE-


Re: Stream processing meetup at LinkedIn (Sunnyvale) on Wednesday, November 2 at 6pm

2016-10-18 Thread Joel Koshy
Yes it will be streamed and archived. The streaming link and subsequent
recording will be posted in the comments on the meetup page.

Thanks,

Joel

On Tue, Oct 18, 2016 at 11:25 AM, João Reis  wrote:

> Hi Joel,
>
> Would it be possible to stream the presentations ?
>
> Cheers,
> João Reis
>
> 
> From: Joel Koshy 
> Sent: Monday, October 17, 2016 10:25:10 PM
> Cc: eyakabo...@linkedin.com
> Subject: Stream processing meetup at LinkedIn (Sunnyvale) on Wednesday,
> November 2 at 6pm
>
> Hi everyone,
>
> We would like to invite you to a Stream Processing Meetup at LinkedIn’s
> Sunnyvale campus on Wednesday, November 2 at 6pm.
>
> Please RSVP here (if you intend to attend in person):
> http://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/234454163
>
> We have the following three talks scheduled:
>
>- *Stream Processing using Apache Samza at LinkedIn: Past, Present, and
>Future*
>   - *by Kartik Paramasivam* (LinkedIn)
>- *Kafka Cruise Control: Auto Management of the Kafka Clusters*
>   - *by Becket (Jiangjie) Qin* (LinkedIn)
>- *Plumber - Intuit's Samza Ecosystem*
>   - *by Shekar Tippur* (Intuit)
>
> Hope to see you there!
>
> Joel
>
> __
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> __
>
> __
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> __


kafka streams metadata request fails on 0.10.0.1 broker/topic from 0.10.1.0 client

2016-10-18 Thread saiprasad mishra
Hi All

Was testing with 0.10.1.0 rc3 build for my new streams app

Seeing issues starting my kafk streams app( 0.10.1.0) on the old version
broker 0.10.0.1. I dont know if it is supposed to work as is. Will upgrade
the broker to same version and see whether it goes away

client side issues

==

java.io.EOFException

at
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83)
~[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
~[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154)
~[kafka-clients-0.10.1.0.jar!/:?]

at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135)
~[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343)
[kafka-clients-0.10.1.0.jar!/:?]

at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
[kafka-clients-0.10.1.0.jar!/:?]

at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:209)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:148)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:136)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:197)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:248)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
[kafka-clients-0.10.1.0.jar!/:?]

at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407)
[kafka-streams-0.10.1.0.jar!/:?]

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242)
[kafka-streams-0.10.1.0.jar!/:?]



On the broker side the following message appears

=

kafka.network.InvalidRequestException: Error getting request for apiKey: 3
and apiVersion: 2

at
kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:95)

at kafka.network.RequestChannel$Request.(RequestChannel.scala:87)

at
kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:488)

at
kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:483)

at scala.collection.Iterator$class.foreach(Iterator.scala:893)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)

at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)

at scala.collection.AbstractIterable.foreach(Iterable.scala:54)

at kafka.network.Processor.processCompletedReceives(SocketServer.scala:483)

at kafka.network.Processor.run(SocketServer.scala:413)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.IllegalArgumentException: Invalid version for API key
3: 2

at org.apache.kafka.common.protocol.ProtoUtils.schemaFor(ProtoUtils.java:31)

at
org.apache.kafka.common.protocol.ProtoUtils.requestSchema(ProtoUtils.java:44)

at
org.apache.kafka.common.protocol.ProtoUtils.parseRequest(ProtoUtils.java:60)

at
org.apache.kafka.common.requests.MetadataRequest.parse(MetadataRequest.java:96)

at
org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:48)

at
kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:92)

Regards

Sai


Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Furkan KAMACI
Hi Matthias,

I've tried this code:

*final Properties streamsConfiguration = new Properties();*
*streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG,
"myapp");*
*streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
"localhost:9092");*
*streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG,
"localhost:2181");*
*streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());*
*streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
Serdes.String().getClass().getName());*
*final KStreamBuilder builder = new KStreamBuilder();*
*final KStream input = builder.stream("myapp-test");*

*final KStream searchCounts =
input.countByKey("SearchRequests").toStream();*
*searchCounts.countByKey(TimeWindows.of("Hourly", 3600 *
1000)).to("outputTopicHourlyCounts");*

*final KafkaStreams streams = new KafkaStreams(builder,
streamsConfiguration);*
*streams.start();*

*Runtime.getRuntime().addShutdownHook(new Thread(streams::close));*

However I get an error:


*Exception in thread "StreamThread-1" java.lang.ClassCastException:
org.apache.kafka.streams.kstream.Windowed cannot be cast to
java.lang.String*

On the other hand when I try this code:

https://gist.github.com/timothyrenner/a99c86b2d6ed2c22c8703e8c7760af3a

I get an error too which indicates that:

*Exception in thread "StreamThread-1"
org.apache.kafka.common.errors.SerializationException: Size of data
received by LongDeserializer is not 8 *

Here is generated topic:

*kafka-console-consumer --zookeeper localhost:2181 --topic
myapp-test --from-beginning*
*28952314828122*
*28988681653726*
*29080089383233*

I know that I miss something but couldn't find it.

Kind Regards,
Furkan KAMACI

On Tue, Oct 18, 2016 at 10:34 PM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> I see. KGroupedStream will be part of 0.10.1.0 (should be release the
> next weeks).
>
> So, instead of
>
> > .groupByKey().count()
>
> you need to do
>
> > .countByKey()
>
>
>
> - -Matthias
>
> On 10/18/16 12:05 PM, Furkan KAMACI wrote:
> > Hi Matthias,
> >
> > Thanks for your detailed answer. By the way I couldn't find
> > "KGroupedStream" at version of 0.10.0.1?
> >
> > Kind Regards, Furkan KAMACI
> >
> > On Tue, Oct 18, 2016 at 8:41 PM, Matthias J. Sax
> >  wrote:
> >
> > Hi,
> >
> > You just need to read you stream and apply an (windowed)
> > aggregation on it.
> >
> > If you use non-windowed aggregation you will get "since the
> > beginning". If you use windowed aggregation you can specify the
> > window size as 1 hour and get those results.
> >
> > One comment: it seems that you want to count *all* queries. To
> > make this work, you need to make sure all records are using the
> > same key (because Kafka Streams only supports aggregation over
> > keyed streams). Keep in mind, that this prohibits parallelization
> > of you aggregation!
> >
> > As a workaround, you could also do two consecutive aggregation, and
> > do parallelize the first one, and do not parallelize the second one
> > (ie, using the first one as a pre aggregation similar to a combine
> > step)
> >
> > Without pre aggregation and assuming all records use the same key
> > something like this (for current trunk):
> >
> >
>  KStreamBuilder builder = new KStreamBuilder(): KStream input
>  = builder.stream("yourTopic");
> 
>  KGroupedStream groupedInput = input.groupByKey();
> 
>  groupedInput.count("countStore").to("outputTopicCountFromBeginning"
> );
> 
> 
> >
> 
> groupedInput.count(TimeWindows.of(3600 * 1000),
> > "windowedCountStore").to("outputTopicHourlyCounts"):
> >
> >
> > For more details, please see the docs and examples:
> >
> > - http://docs.confluent.io/current/streams/index.html -
> > https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0.1/
> ka
> >
> >
> fka-streams
> >
> >
> > -Matthias
> >
> > On 10/18/16 5:00 AM, Furkan KAMACI wrote:
>  Hi,
> 
>  I could successfully run Kafka at my environment. I want to
>  monitor Queries per Second at my search application with
>  Kafka. Whenever a search request is done I create a
>  ProducerRecord which holds current nano time of the system.
> 
>  I know that I have to use a streaming API for calculation
>  i.e. Kafka Streams or Spark Streams. My choice is to use
>  Kafka Streams.
> 
>  For last 1 hours, or since the beginning, I have to calculate
>  the queries per second. How can I make such an aggregation at
>  Kafka Streams?
> 
>  Kind Regards, Furkan KAMACI
> 
> >>
> >
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYBnlRAAoJECnhiMLycopPF+cQAKWt58HvcEebqXC+KlSc5M8c
> rcxqTbkH3YT9SEm0zLinoXWaJyd/EHUkaWSStiNekZgRe9BsXBHjFnhy/Pg20D0A
> JYKBA0IK4DTBy6sJvu1Wyd08iQ85HTmFlMZDg38EkTJ

Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Two things:

1) you should not apply the window to the first count, but to the base
stream to get correct results.

2) your windowed aggregation, doew not just return String type, but
Window type. Thus, you need to either insert a .map() to transform
you data into String typo, or you provide a custom serializer when
writing data to output topic (method, .to(...) has multiple overloads)

Per default, each topic read/write operation uses Serdes from the
streams config. If you data has a different type, you need to provide
appropriate Serdes for those operators.


- -Matthias

On 10/18/16 2:01 PM, Furkan KAMACI wrote:
> Hi Matthias,
> 
> I've tried this code:
> 
> *final Properties streamsConfiguration = new
> Properties();* *
> streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, 
> "myapp");* *
> streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, 
> "localhost:9092");* *
> streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, 
> "localhost:2181");* *
> streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, 
> Serdes.String().getClass().getName());* *
> streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, 
> Serdes.String().getClass().getName());* *final
> KStreamBuilder builder = new KStreamBuilder();* *final
> KStream input = builder.stream("myapp-test");*
> 
> *final KStream searchCounts = 
> input.countByKey("SearchRequests").toStream();* *
> searchCounts.countByKey(TimeWindows.of("Hourly", 3600 * 
> 1000)).to("outputTopicHourlyCounts");*
> 
> *final KafkaStreams streams = new KafkaStreams(builder, 
> streamsConfiguration);* *streams.start();*
> 
> *Runtime.getRuntime().addShutdownHook(new
> Thread(streams::close));*
> 
> However I get an error:
> 
> 
> *Exception in thread "StreamThread-1"
> java.lang.ClassCastException: 
> org.apache.kafka.streams.kstream.Windowed cannot be cast to 
> java.lang.String*
> 
> On the other hand when I try this code:
> 
> https://gist.github.com/timothyrenner/a99c86b2d6ed2c22c8703e8c7760af3a
>
>  I get an error too which indicates that:
> 
> *Exception in thread "StreamThread-1" 
> org.apache.kafka.common.errors.SerializationException: Size of
> data received by LongDeserializer is not 8 *
> 
> Here is generated topic:
> 
> *kafka-console-consumer --zookeeper localhost:2181 --topic 
> myapp-test --from-beginning* *28952314828122* *
> 28988681653726* *29080089383233*
> 
> I know that I miss something but couldn't find it.
> 
> Kind Regards, Furkan KAMACI
> 
> On Tue, Oct 18, 2016 at 10:34 PM, Matthias J. Sax
>  wrote:
> 
> I see. KGroupedStream will be part of 0.10.1.0 (should be release
> the next weeks).
> 
> So, instead of
> 
 .groupByKey().count()
> 
> you need to do
> 
 .countByKey()
> 
> 
> 
> -Matthias
> 
> On 10/18/16 12:05 PM, Furkan KAMACI wrote:
 Hi Matthias,
 
 Thanks for your detailed answer. By the way I couldn't find 
 "KGroupedStream" at version of 0.10.0.1?
 
 Kind Regards, Furkan KAMACI
 
 On Tue, Oct 18, 2016 at 8:41 PM, Matthias J. Sax 
  wrote:
 
 Hi,
 
 You just need to read you stream and apply an (windowed) 
 aggregation on it.
 
 If you use non-windowed aggregation you will get "since the 
 beginning". If you use windowed aggregation you can specify
 the window size as 1 hour and get those results.
 
 One comment: it seems that you want to count *all* queries.
 To make this work, you need to make sure all records are
 using the same key (because Kafka Streams only supports
 aggregation over keyed streams). Keep in mind, that this
 prohibits parallelization of you aggregation!
 
 As a workaround, you could also do two consecutive
 aggregation, and do parallelize the first one, and do not
 parallelize the second one (ie, using the first one as a pre
 aggregation similar to a combine step)
 
 Without pre aggregation and assuming all records use the same
 key something like this (for current trunk):
 
 
>>> KStreamBuilder builder = new KStreamBuilder(): KStream
>>> input = builder.stream("yourTopic");
>>> 
>>> KGroupedStream groupedInput = input.groupByKey();
>>> 
>>> groupedInput.count("countStore").to("outputTopicCountFromBeginni
ng"
>
>>> 
);
>>> 
>>> 
 
>>> 
> groupedInput.count(TimeWindows.of(3600 * 1000),
 "windowedCountStore").to("outputTopicHourlyCounts"):
 
 
 For more details, please see the docs and examples:
 
 - http://docs.confluent.io/current/streams/index.html - 
 https://github.com/confluentinc/examples/tree/kafka-0.10.0.1-cp-3.0
.1/
>
 
ka
 
 
> fka-streams
 
 
 -Matthias
 
 On 10/18/16 5:00 AM, Furkan KAMACI wrote:
>>> Hi,
>>> 
>>> I could successfully run Kafka at my environment. I
>>> want to monitor Q

Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Furkan KAMACI
Sorry about concurrent questions. Tried below code, didn't get any error
but couldn't get created output topic:

Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");

Producer producer = new KafkaProducer<>(props);

for (int i = 0; i < 1000; i++) {
producer.send(new ProducerRecord<>(
"input-topic",
String.format("{\"type\":\"test\", \"t\":%.3f,
\"k\":%d}", System.nanoTime() * 1e-9, i)));


final KStreamBuilder builder = new KStreamBuilder();
final KStream qps = builder.stream(Serdes.String(),
Serdes.Long(), "input-topic");
qps.countByKey(TimeWindows.of("Hourly", 3600 *
1000)).mapValues(Object::toString).to("output-topic");

final KafkaStreams streams = new KafkaStreams(builder,
streamsConfiguration);
streams.start();

Runtime.getRuntime().addShutdownHook(new Thread(streams::close));

On Wed, Oct 19, 2016 at 12:14 AM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Two things:
>
> 1) you should not apply the window to the first count, but to the base
> stream to get correct results.
>
> 2) your windowed aggregation, doew not just return String type, but
> Window type. Thus, you need to either insert a .map() to transform
> you data into String typo, or you provide a custom serializer when
> writing data to output topic (method, .to(...) has multiple overloads)
>
> Per default, each topic read/write operation uses Serdes from the
> streams config. If you data has a different type, you need to provide
> appropriate Serdes for those operators.
>
>
> - -Matthias
>
> On 10/18/16 2:01 PM, Furkan KAMACI wrote:
> > Hi Matthias,
> >
> > I've tried this code:
> >
> > *final Properties streamsConfiguration = new
> > Properties();* *
> > streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG,
> > "myapp");* *
> > streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,
> > "localhost:9092");* *
> > streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG,
> > "localhost:2181");* *
> > streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> > Serdes.String().getClass().getName());* *
> > streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,
> > Serdes.String().getClass().getName());* *final
> > KStreamBuilder builder = new KStreamBuilder();* *final
> > KStream input = builder.stream("myapp-test");*
> >
> > *final KStream searchCounts =
> > input.countByKey("SearchRequests").toStream();* *
> > searchCounts.countByKey(TimeWindows.of("Hourly", 3600 *
> > 1000)).to("outputTopicHourlyCounts");*
> >
> > *final KafkaStreams streams = new KafkaStreams(builder,
> > streamsConfiguration);* *streams.start();*
> >
> > *Runtime.getRuntime().addShutdownHook(new
> > Thread(streams::close));*
> >
> > However I get an error:
> >
> >
> > *Exception in thread "StreamThread-1"
> > java.lang.ClassCastException:
> > org.apache.kafka.streams.kstream.Windowed cannot be cast to
> > java.lang.String*
> >
> > On the other hand when I try this code:
> >
> > https://gist.github.com/timothyrenner/a99c86b2d6ed2c22c8703e8c7760af3a
> >
> >  I get an error too which indicates that:
> >
> > *Exception in thread "StreamThread-1"
> > org.apache.kafka.common.errors.SerializationException: Size of
> > data received by LongDeserializer is not 8 *
> >
> > Here is generated topic:
> >
> > *kafka-console-consumer --zookeeper localhost:2181 --topic
> > myapp-test --from-beginning* *28952314828122* *
> > 28988681653726* *29080089383233*
> >
> > I know that I miss something but couldn't find it.
> >
> > Kind Regards, Furkan KAMACI
> >
> > On Tue, Oct 18, 2016 at 10:34 PM, Matthias J. Sax
> >  wrote:
> >
> > I see. KGroupedStream will be part of 0.10.1.0 (should be release
> > the next weeks).
> >
> > So, instead of
> >
>  .groupByKey().count()
> >
> > you need to do
> >
>  .countByKey()
> >
> >
> >
> > -Matthias
> >
> > On 10/18/16 12:05 PM, Furkan KAMACI wrote:
>  Hi Matthias,
> 
>  Thanks for your detailed answer. By the way I couldn't find
>  "KGroupedStream" at version of 0.10.0.1?
> 
>  Kind Regards, Furkan KAMACI
> 
>  On Tue, Oct 18, 2016 at 8:41 PM, Matthias J. Sax
>   wrote:
> 
>  Hi,
> 
>  You just need to read you stream and apply an (windowed)
>  aggregation on it.
> 
>  If you use non-windowed aggregation you will get "since the
>  beginning". If you use windowed aggregation you can spec

Re: Kafka Streams Aggregate By Date

2016-10-18 Thread Matthias J. Sax
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

You should create input/intermediate and output topic manually before
you start you Kafka Streams application.


- -Matthias

On 10/18/16 3:34 PM, Furkan KAMACI wrote:
> Sorry about concurrent questions. Tried below code, didn't get any
> error but couldn't get created output topic:
> 
> Properties props = new Properties(); props.put("bootstrap.servers",
> "localhost:9092"); props.put("acks", "all"); props.put("retries",
> 0); props.put("batch.size", 16384); props.put("linger.ms", 1); 
> props.put("buffer.memory", 33554432); props.put("key.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer"); 
> props.put("value.serializer", 
> "org.apache.kafka.common.serialization.StringSerializer");
> 
> Producer producer = new KafkaProducer<>(props);
> 
> for (int i = 0; i < 1000; i++) { producer.send(new
> ProducerRecord<>( "input-topic", String.format("{\"type\":\"test\",
> \"t\":%.3f, \"k\":%d}", System.nanoTime() * 1e-9, i)));
> 
> 
> final KStreamBuilder builder = new KStreamBuilder(); final
> KStream qps = builder.stream(Serdes.String(), 
> Serdes.Long(), "input-topic"); 
> qps.countByKey(TimeWindows.of("Hourly", 3600 * 
> 1000)).mapValues(Object::toString).to("output-topic");
> 
> final KafkaStreams streams = new KafkaStreams(builder, 
> streamsConfiguration); streams.start();
> 
> Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
> 
> On Wed, Oct 19, 2016 at 12:14 AM, Matthias J. Sax
>  wrote:
> 
> Two things:
> 
> 1) you should not apply the window to the first count, but to the
> base stream to get correct results.
> 
> 2) your windowed aggregation, doew not just return String type,
> but Window type. Thus, you need to either insert a .map() to
> transform you data into String typo, or you provide a custom
> serializer when writing data to output topic (method, .to(...) has
> multiple overloads)
> 
> Per default, each topic read/write operation uses Serdes from the 
> streams config. If you data has a different type, you need to
> provide appropriate Serdes for those operators.
> 
> 
> -Matthias
> 
> On 10/18/16 2:01 PM, Furkan KAMACI wrote:
 Hi Matthias,
 
 I've tried this code:
 
 *final Properties streamsConfiguration = new 
 Properties();* * 
 streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG,

 
"myapp");* *
 streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,

 
"localhost:9092");* *
 streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG,

 
"localhost:2181");* *
 streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,

 
Serdes.String().getClass().getName());* *
 streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG,

 
Serdes.String().getClass().getName());* *final
 KStreamBuilder builder = new KStreamBuilder();* *
 final KStream input = builder.stream("myapp-test");*
 
 *final KStream searchCounts = 
 input.countByKey("SearchRequests").toStream();* * 
 searchCounts.countByKey(TimeWindows.of("Hourly", 3600 * 
 1000)).to("outputTopicHourlyCounts");*
 
 *final KafkaStreams streams = new
 KafkaStreams(builder, streamsConfiguration);* *
 streams.start();*
 
 *Runtime.getRuntime().addShutdownHook(new 
 Thread(streams::close));*
 
 However I get an error:
 
 
 *Exception in thread "StreamThread-1" 
 java.lang.ClassCastException: 
 org.apache.kafka.streams.kstream.Windowed cannot be cast to 
 java.lang.String*
 
 On the other hand when I try this code:
 
 https://gist.github.com/timothyrenner/a99c86b2d6ed2c22c8703e8c7760a
f3a


 
I get an error too which indicates that:
 
 *Exception in thread "StreamThread-1" 
 org.apache.kafka.common.errors.SerializationException: Size
 of data received by LongDeserializer is not 8 *
 
 Here is generated topic:
 
 *kafka-console-consumer --zookeeper localhost:2181 --topic 
 myapp-test --from-beginning* *28952314828122* * 
 28988681653726* *29080089383233*
 
 I know that I miss something but couldn't find it.
 
 Kind Regards, Furkan KAMACI
 
 On Tue, Oct 18, 2016 at 10:34 PM, Matthias J. Sax 
  wrote:
 
 I see. KGroupedStream will be part of 0.10.1.0 (should be
 release the next weeks).
 
 So, instead of
 
>>> .groupByKey().count()
 
 you need to do
 
>>> .countByKey()
 
 
 
 -Matthias
 
 On 10/18/16 12:05 PM, Furkan KAMACI wrote:
>>> Hi Matthias,
>>> 
>>> Thanks for your detailed answer. By the way I couldn't
>>> find "KGroupedStream" at version of 0.10.0.1?
>>> 
>>> Kind Regards, Furkan KAMACI
>>> 
>>> On Tue, Oct 18, 2016 at 8:41 PM, Matthias J. Sax 
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> You just need to 

Kafka(9.0.1) error : org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1164731757 larger than 104857600)

2016-10-18 Thread Arun Rai
Hello Kafka/Cassandra experts,


*I am getting below error….*

org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(
NetworkReceive.java:91)

at org.apache.kafka.common.network.NetworkReceive.
readFrom(NetworkReceive.java:71)

at org.apache.kafka.common.network.KafkaChannel.receive(
KafkaChannel.java:153)

at org.apache.kafka.common.network.KafkaChannel.read(
KafkaChannel.java:134)

at org.apache.kafka.common.network.Selector.poll(Selector.java:286)

at kafka.network.Processor.run(SocketServer.scala:413)

at java.lang.Thread.run(Thread.java:745)

*[2016-10-06 22:17:42,001] WARN Unexpected error from /10.61.48.28
; closing connection
(org.apache.kafka.common.network.Selector)*

org.apache.kafka.common.network.InvalidReceiveException: Invalid receive
(size = 1969447758 larger than 104857600)

at



Below is my case.

*Cluster configuration :*



Kafka cluster : 3 vms with each one has 4 vcpu, 8 gb RAM

Cassandra : 4 vms with each one has 8 vcpu, 16 gb RAM



1.   Producer: Producer is rdkafka client, which produce the
messages(protobuf) to kafka topic(with 12 partitions). Please message size
always vary. It’s not fixed size message. Max message size could be from 1
mb to 3 mb.

2.   Consumer client : Client is java program who reads the messages
and load into Cassandra database.

*3.   **I get this error only when both producer and consumer is
running together , after 15-20 minute later I see this error in kafka log
files.*

4.   If I let consumer or producer run at single time.. I don’t get
this error.

5.   I do not get this error if Consumer only consume the messages and
do not insert into Cassandra.



Below is my kafka server configuration props:





broker.id=1

log.dirs=/data/kafka



host.name=10.61.19.87

port=9092



advertised.host.name=10.61.19.87

advertised.port=9092

listeners=PLAINTEXT://0.0.0.0:9092



delete.topic.enable=true



# Replication configurations

num.replica.fetchers=4

replica.fetch.max.bytes=1048576

replica.fetch.wait.max.ms=500

replica.high.watermark.checkpoint.interval.ms=5000

replica.socket.timeout.ms=3

replica.socket.receive.buffer.bytes=65536

replica.lag.time.max.ms=1



controller.socket.timeout.ms=3

controller.message.queue.size=10



# Log configuration

num.partitions=12

message.max.bytes=100

auto.create.topics.enable=true

log.index.interval.bytes=4096

log.index.size.max.bytes=10485760

log.retention.hours=24

log.flush.interval.ms=1

log.flush.interval.messages=2

log.flush.scheduler.interval.ms=2000

log.roll.hours=24

log.retention.check.interval.ms=30

log.segment.bytes=1073741824



# ZK configuration

zookeeper.connect=10.61.19.84:2181,10.61.19.86:2181

zookeeper.connection.timeout.ms=18000

zookeeper.sync.time.ms=2000



# Socket server configuration

num.io.threads=8

num.network.threads=8

socket.request.max.bytes=104857600

socket.receive.buffer.bytes=1048576

socket.send.buffer.bytes=1048576

queued.max.requests=16

fetch.purgatory.purge.interval.requests=100

producer.purgatory.purge.interval.requests=100



Any help will be really appreciated.

Arun Rai


i have charset question about kafka

2016-10-18 Thread /kun????????
i hava test my kafka programe using kafka-console-producer.sh and 
kafka-console-consumer.sh.
 setp1  :
 [oracle@bd02 bin]$ ./kafka-console-producer.sh --broker-list 
133.224.217.175:9092 --topic oggtopic

 setp2:
 [oracle@bd02 bin]$ ./kafka-console-consumer.sh --zookeeper 
133.224.217.175:2181/kafka --from-beginning --topic oggtopic
 

 please tell me how to solve 

 ths ! 
  
 jason

Re: Re: Manually update consumer offset stored in Kafka

2016-10-18 Thread Yuanjia
Hi Yifan,
You can try this procedure with kafka0.10, stop consumer group before do 
that.
consumer.subscribe(Arrays.asList(topic));//consumer "enable.auto.commit" 
set false
consumer.poll(1000);
consumer.commitSync(offsets);//the offsets is to be updated consumer offset



Yuanjia Li
 
From: Yifan Ying
Date: 2016-10-15 01:53
To: users
Subject: Re: Manually update consumer offset stored in Kafka
Hi Jeff,
 
Could you explain how you send messages to __consumer_offsets to overwrite
offsets? Thanks!
 
Yifan
 
On Fri, Oct 14, 2016 at 9:55 AM, Jeff Widman  wrote:
 
> I also would like to know this.
>
> Is the solution to just use a console producer against the internal topics
> that store the offsets?
>
> On Wed, Oct 12, 2016 at 2:26 PM, Yifan Ying  wrote:
>
> > Hi,
> >
> > In old consumers, we use the following command line tool to manually
> update
> > offsets stored in zk:
> >
> > *./kafka-run-class.sh kafka.tools.UpdateOffsetsInZK [latest | earliest]
> > [consumer.properties file path] [topic]*
> >
> > But it doesn't work with offsets stored in Kafka. How can I update the
> > Kafka offsets to latest?
> >
> > Yifan
> >
> > --
> > Yifan
> >
>
 
 
 
-- 
Yifan


i have charset question about kafka

2016-10-18 Thread /kun????????
i hava test my kafka programe using kafka-console-producer.sh and 
kafka-console-consumer.sh.

RE: A question about kafka

2016-10-18 Thread ZHU Hua B
Hi,


Anybody could help to answer below question? If compression type could be 
modified through command " bin/kafka-console-producer.sh --producer.config 
"? Thanks!






Best Regards

Johnny

-Original Message-
From: ZHU Hua B 
Sent: 2016年10月17日 14:52
To: users@kafka.apache.org; Radoslaw Gruchalski
Subject: RE: A question about kafka

Hi,


Thanks for your reply!

OK, I got it. And, there is a parameter named compression.type in 
config/producer.properties, which is same usage as "--compression-codec " I 
think. I modify compression.type in config/producer.properties firstly, then 
run console producer with option "--producer.config " and send 
message, but the compression codec could not change as modification. Do you 
know the reason about it? Thanks!


# bin/kafka-console-producer.sh
Read data from standard input and publish it to Kafka.
Option   Description
--   ---
--producer.config   Producer config properties file. Note
   that [producer-property] takes
   precedence over this config.
# bin/kafka-console-producer.sh --producer.config config/producer.properties 
--broker-list localhost:9092 --topic test



Best Regards

Johnny


-Original Message-
From: Hans Jespersen [mailto:h...@confluent.io]
Sent: 2016年10月17日 14:29
To: users@kafka.apache.org; Radoslaw Gruchalski
Subject: RE: A question about kafka

Because the producer-property option is used to set other properties that are 
not compression type.
//h...@confluent.io
 Original message From: ZHU Hua B 
 Date: 10/16/16  11:20 PM  (GMT-08:00) To: 
Radoslaw Gruchalski , users@kafka.apache.org Subject: RE: 
A question about kafka Hi,


Thanks for your reply!

If console producer only allows for compression codec argument, why we could 
found option —producer-property defined in ConsoleProducer.scala? And we could 
find the usage also if we running console producer? The version we used is 
Kafka 0.10.0.0. Thanks!


# ./kafka-console-producer.sh
Read data from standard input and publish it to Kafka.
Option   Description
--   --- --compression-codec 
[compression-codec]  The compression codec: either 'none',
   'gzip', 'snappy', or 'lz4'.If
   specified without value, then it
   defaults to 'gzip'
--producer-property   A mechanism to pass user-defined
   properties in the form key=value to
   the producer.
--producer.config   Producer config properties file. Note
   that [producer-property] takes
   precedence over this config.
--property     A mechanism to pass user-defined
   properties in the form key=value to
   the message reader. This allows
   custom configuration for a user-
   defined message reader.



Best Regards

Johnny

From: Radoslaw Gruchalski [mailto:ra...@gruchalski.com]
Sent: 2016年10月17日 14:02
To: ZHU Hua B; users@kafka.apache.org
Subject: RE: A question about kafka

Hi,

I believe the answer is in the code. This is where the --compression-codec is 
processed:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/ConsoleProducer.scala#L143
and this is —producer-property:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/ConsoleProducer.scala#L234

The usage is here:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/ConsoleProducer.scala#L114

The answer is: The console producer allows for compression codec only with 
—compression-codec argument.

–
Best regards,

Radek Gruchalski

ra...@gruchalski.com


On October 17, 2016 at 7:46:41 AM, ZHU Hua B 
(hua.b@alcatel-lucent.com) wrote:
Hi,


Anybody could help to answer this question? Thanks!






Best Regards

Johnny

-Original Message-
From: ZHU Hua B
Sent: 2016年10月14日 16:41
To: users@kafka.apache.org
Subject: [COMMERCIAL] A question about kafka

Hi,


I have a question about kafka, could you please help to have a look?

I want to send a message from producer with snappy compression codec. So I run 
the command "bin/kafka-console-producer.sh --compression-codec snappy 
--broker-list localhost:9092 --topic test", after that I checked the data log, 
compresscodec is SnappyCompressionCodec as expectation.

Then I tried another command "bin/kafka-console-producer.sh --producer-prop

If the same topic could re-mirror after delete it

2016-10-18 Thread ZHU Hua B
Hi,


If I deleted a topic on the target Kafka cluster, if Kafka mirror maker could 
mirror the same topic again from source cluster? Thanks!






Best Regards

Johnny



Re: [kafka-clients] [VOTE] 0.10.1.0 RC3

2016-10-18 Thread Neha Narkhede
+1 (binding)

Verified quick start and artifacts.
On Mon, Oct 17, 2016 at 10:39 PM Dana Powers  wrote:

> +1 -- passes kafka-python integration tests
>
> On Mon, Oct 17, 2016 at 1:28 PM, Jun Rao  wrote:
> > Thanks for preparing the release. Verified quick start on scala 2.11
> binary.
> > +1
> >
> > Jun
> >
> > On Fri, Oct 14, 2016 at 4:29 PM, Jason Gustafson 
> wrote:
> >>
> >> Hello Kafka users, developers and client-developers,
> >>
> >> One more RC for 0.10.1.0. We're hoping this is the final one so that we
> >> can meet the release target date of Oct. 17 (Monday). Please let me
> know as
> >> soon as possible if you find any major problems.
> >>
> >> Release plan:
> >> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+0.10.1.
> >>
> >> Release notes for the 0.10.1.0 release:
> >> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/RELEASE_NOTES.html
> >>
> >> *** Please download, test and vote by Monday, Oct 17, 5pm PT
> >>
> >> Kafka's KEYS file containing PGP keys we use to sign the release:
> >> http://kafka.apache.org/KEYS
> >>
> >> * Release artifacts to be voted upon (source and binary):
> >> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/
> >>
> >> * Maven artifacts to be voted upon:
> >> https://repository.apache.org/content/groups/staging/
> >>
> >> * Javadoc:
> >> http://home.apache.org/~jgus/kafka-0.10.1.0-rc3/javadoc/
> >>
> >> * Tag to be voted upon (off 0.10.1 branch) is the 0.10.1.0-rc3 tag:
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=50f30a44f31fca1bd9189d2814388d51bd56b06b
> >>
> >> * Documentation:
> >> http://kafka.apache.org/0101/documentation.html
> >>
> >> * Protocol:
> >> http://kafka.apache.org/0101/protocol.html
> >>
> >> * Tests:
> >> Unit tests: https://builds.apache.org/job/kafka-0.10.1-jdk7/71/
> >> System tests:
> >>
> http://testing.confluent.io/confluent-kafka-0-10-1-system-test-results/?prefix=2016-10-13--001.1476369986--apache--0.10.1--ee212d1/
> >>
> >> (Note that these tests do not include a couple patches merged today. I
> >> will send links to updated test builds as soon as they are available)
> >>
> >> Thanks,
> >>
> >> Jason
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "kafka-clients" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to kafka-clients+unsubscr...@googlegroups.com.
> >> To post to this group, send email to kafka-clie...@googlegroups.com.
> >> Visit this group at https://groups.google.com/group/kafka-clients.
> >> To view this discussion on the web visit
> >>
> https://groups.google.com/d/msgid/kafka-clients/CAJDuW%3DBm0HCOjiHiwnW3WJ_i5u_0e4J2G_mZ_KBkB_WEmo7pNg%40mail.gmail.com
> .
> >> For more options, visit https://groups.google.com/d/optout.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "kafka-clients" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to kafka-clients+unsubscr...@googlegroups.com.
> > To post to this group, send email to kafka-clie...@googlegroups.com.
> > Visit this group at https://groups.google.com/group/kafka-clients.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G_Atqyc7-O13EGnRNibng5UPo-a_2h00N2%3D%3DMtWktm%3D1g%40mail.gmail.com
> .
> >
> > For more options, visit https://groups.google.com/d/optout.
>
-- 
Thanks,
Neha