Re: Blocked in KafkaConsumer.commitOffsets

2016-07-11 Thread Hironori Ogibayashi
Hi

After modification, my program run for 3 days without problem. Maximum
checkpoint
duration was 6 seconds. (Before modification, it took several minutes)
I think the issue was fixed with it.

Regards,
Hionori


2016-06-15 22:51 GMT+09:00 Robert Metzger :
> Hi,
>
> I've looked at this issue already at the Flink list and recommended Hironori
> to post here. It seems that the consumer is not returning from the poll()
> call, that's why the commitOffsets() method can not enter the synchronized
> block.
> The KafkaConsumer is logging the following statements:
>
> 2016-06-10 20:29:53,677 INFO
>  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking
> the coordinator 2147482645 dead.
> 2016-06-10 20:29:53,678 INFO
>  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking
> the coordinator 2147482645 dead.
> 2016-06-10 20:29:53,679 INFO
>  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking
> the coordinator 2147482645 dead.
> 
> 2016-06-10 20:56:53,982 INFO
>  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking
> the coordinator 2147482645 dead.
>
>
> I guess that the poll() call is not returning within the given timeout
> while trying to reconnect to the brokers?
>
>
> On Wed, Jun 15, 2016 at 2:41 PM, Hironori Ogibayashi 
> wrote:
>
>> Hello,
>>
>> I am running stream processing job with Kafka and Flink.
>> Flink reads records from Kafka.
>>
>> My software versions are:
>> - Kafka broker: 0.9.0.2.4 (HDP 2.4.0.0 version)
>> - Kafka client library: 0.9.0.1
>> - Flink: 1.0.3
>>
>> Now I have problem that Flink job is sometimes blocked and consumer lag
>> is increasing.
>> I got thread dump during the situation.
>>
>> This is the blocked thread. Looks like blocked in
>> KafkaConsumer.commitOffsets.
>>
>> 
>> "Async calls on Source: Custom Source -> Flat Map (2/3)" daemon
>> prio=10 tid=0x7f2b14010800 nid=0x1b89a waiting for monitor entry
>> [0x7f2b3ddfc000]
>>java.lang.Thread.State: BLOCKED (on object monitor)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.commitOffsets(FlinkKafkaConsumer09.java:392)
>> - waiting to lock <0x000659111b58> (a
>> org.apache.kafka.clients.consumer.KafkaConsumer)
>> at
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.notifyCheckpointComplete(FlinkKafkaConsumerBase.java:169)
>> at
>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:179)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:596)
>> - locked <0x000659111cc8> (a java.lang.Object)
>> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:945)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> ---
>>
>> And lock 0x000659111b58 is held by the following thread.
>>
>> ---
>> "Thread-9" daemon prio=10 tid=0x7f2b2440d000 nid=0x1b838 runnable
>> [0x7f2b3dbfa000]
>>java.lang.Thread.State: RUNNABLE
>> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>> - locked <0x000659457dc8> (a sun.nio.ch.Util$2)
>> - locked <0x000659457db8> (a
>> java.util.Collections$UnmodifiableSet)
>> - locked <0x000659457108> (a sun.nio.ch.EPollSelectorImpl)
>> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>> at
>> org.apache.kafka.common.network.Selector.select(Selector.java:425)
>> at org.apache.kafka.common.network.Selector.poll(Selector.java:254)
>> at
>> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256)
>> at
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>> at
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>> at
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>> at
>> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>> at
>> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853)
>> at
>> 

Re: Kafka Consumer for Real-Time Application?

2016-07-11 Thread BYEONG-GI KIM
Thank you for the reply.

I thought that was what I found, but unfortunately wasn't.

The previous messages still be consumed while the consumer has been
re-executed with a few shutdown time...



2016-07-12 9:54 GMT+09:00 Snehal Nagmote :

> Hello *,*
>
> If I understand your question correctly , what you are looking for is a
> setting in consumer which will only read latest messages .
>
> auto.offset.reset = latest
>
> This way , when you start new consumer group , it will always start reading
> from the recent offset,
>
>
> Thanks,
> Snehal
>
>
>
>
>
>
>
> On 11 July 2016 at 17:38, BYEONG-GI KIM  wrote:
>
> > Hello.
> >
> > Generally, a Kafka Consumer consumes stored messages from Kafka Broker(s)
> > when the Consumer has been executed.
> >
> > I, however, want to create a function that only consumes incoming
> messages
> > after executing, instead of consuming the previously stored messages as I
> > mentioned above, for real-time application.
> >
> > Is it doable? If yes, how could it be implemented? Any example code or
> > reference would really be helpful for me.
> >
> > Thanks in advance!
> >
> > Best regards
> >
> > KIM
> >
>


Re: Kafka Consumer for Real-Time Application?

2016-07-11 Thread Snehal Nagmote
Hello *,*

If I understand your question correctly , what you are looking for is a
setting in consumer which will only read latest messages .

auto.offset.reset = latest

This way , when you start new consumer group , it will always start reading
from the recent offset,


Thanks,
Snehal







On 11 July 2016 at 17:38, BYEONG-GI KIM  wrote:

> Hello.
>
> Generally, a Kafka Consumer consumes stored messages from Kafka Broker(s)
> when the Consumer has been executed.
>
> I, however, want to create a function that only consumes incoming messages
> after executing, instead of consuming the previously stored messages as I
> mentioned above, for real-time application.
>
> Is it doable? If yes, how could it be implemented? Any example code or
> reference would really be helpful for me.
>
> Thanks in advance!
>
> Best regards
>
> KIM
>


Kafka Consumer for Real-Time Application?

2016-07-11 Thread BYEONG-GI KIM
Hello.

Generally, a Kafka Consumer consumes stored messages from Kafka Broker(s)
when the Consumer has been executed.

I, however, want to create a function that only consumes incoming messages
after executing, instead of consuming the previously stored messages as I
mentioned above, for real-time application.

Is it doable? If yes, how could it be implemented? Any example code or
reference would really be helpful for me.

Thanks in advance!

Best regards

KIM


Re: AWS EFS

2016-07-11 Thread Pete Wright
On Wed, Jun 29, 2016 at 01:30:34PM +0100, Tom Crayford wrote:
> I think you'll be far better off using EBS and Kafka's inbuilt distribution
> than NFS mounts. Kafka's designed for distributing data natively, and not
> for NFS style mounts.
> 

I did some initial benchmarking of amazon EFS and agree with your
assesment.  I think you'll be better off with EBS as you can have
slightly more control over the performance of the disk subsystem as
opposed to EFS which is based on NFSv4.  And while the performance of
EFS on my FreeBSD test systems was OK, I did find that it was trivial to
saturate the network interface the EFS volume is mounted on.  

I am personally evaluating EFS for usecases where I need to share data
for content delivery or as a scratch space for PostgreSQL WAL logs for
for example.

Cheers,
-pete

-- 
Pete Wright
Lead Systems Architect
Rubicon Project
pwri...@rubiconproject.com
310.309.9298


Re: [DISCUSS] Java 8 as a minimum requirement

2016-07-11 Thread Pete Wright
On Fri, Jun 17, 2016 at 12:02:17AM +0200, jan.o...@wooga.net wrote:
> 
> Hi Ismael,
> 
> Unfortunately Java 8 doesn't play nice with FreeBSD. We have seen a lot of 
> JVM crashes running our 0.9 brokers on Java 8... Java 7 on the other hand is 
> totally stable. 
> 
> Until these issues have been addressed, this would cause some serious issues 
> for us.
> 
I can second this on the FreeBSD front.

-pete


-- 
Pete Wright
Lead Systems Architect
Rubicon Project
pwri...@rubiconproject.com
310.309.9298


Re: Contribution : KafkaStreams CEP library

2016-07-11 Thread Florian Hussonnois
Hi,

It would be very great if you can link my repo. Thank very much.

2016-07-11 18:26 GMT+02:00 Guozhang Wang :

> Thanks Florian!
>
>
> Do you mind if I link your repo in the Kafka Streams eco-system page?
>
>
> Guozhang
>
> On Mon, Jul 11, 2016 at 1:37 AM, Michael Noll 
> wrote:
>
> > Thanks for sharing, Florian!
> >
> > -Michael
> >
> >
> > On Fri, Jul 8, 2016 at 6:48 PM, Florian Hussonnois <
> fhussonn...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > Since a few weeks I'm working for fun on a CEP library on top of
> > > KafkaStreams.
> > > There is some progress and I think my project start to look something,
> or
> > > at least I hope ;)
> > >
> > > https://github.com/fhussonnois/kafkastreams-cep
> > >
> > > So I'm pleased to share it with you (I already shared it with dev
> mailing
> > > list but I just realised that I've forgotten to add the user list ^^ ).
> > >
> > > Currently, I'm looking to test my library against real use-cases. If
> some
> > > of you test it please I would appreciate any feedback.
> > >
> > > Any contribution is welcome. I'm sure this project can be improved in
> > many
> > > ways.
> > >
> > > Thank in advance,
> > >
> > > --
> > > Florian HUSSONNOIS
> > >
> >
>
>
>
> --
> -- Guozhang
>



-- 
Florian HUSSONNOIS


Re: Rolling upgrade from 0.8.2.1 to 0.9.0.1 failing with replicafetchthread OOM errors

2016-07-11 Thread Hao Xu
Oh, I am talking about another memory leak. the offheap memory leak we had
experienced. Which is about Direct Buffer memory. the callstack as below.
 ReplicaFetcherThread.warn - [ReplicaFetcherThread-4-1463989770], Error in
fetch kafka.server.ReplicaFetcherThread$FetchRequest@7f4c1657. Possible
cause: java.lang.OutOfMemoryError: Direct buffer memory
 ERROR kafka-network-thread-75737866-PLAINTEXT-26 Processor.error -
Processor got uncaught exception.
local3.err java.lang.OutOfMemoryError: - Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)

On Mon, Jul 11, 2016 at 10:54 AM, Tom Crayford  wrote:

> Hi (I'm the author of that ticket):
>
> From my understanding limiting MaxDirectMemory won't workaround this memory
> leak. The leak is inside the JVM's implementation, not in normal direct
> buffers. On one of our brokers with this issue, we're seeing the JVM report
> 1.2GB of heap, and 128MB of offheap memory, yet the actual process memory
> is more like 10GB.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>



-- 
-hao


Re: Rolling upgrade from 0.8.2.1 to 0.9.0.1 failing with replicafetchthread OOM errors

2016-07-11 Thread Tom Crayford
Hi (I'm the author of that ticket):

>From my understanding limiting MaxDirectMemory won't workaround this memory
leak. The leak is inside the JVM's implementation, not in normal direct
buffers. On one of our brokers with this issue, we're seeing the JVM report
1.2GB of heap, and 128MB of offheap memory, yet the actual process memory
is more like 10GB.

Thanks

Tom Crayford
Heroku Kafka


Re: Rolling upgrade from 0.8.2.1 to 0.9.0.1 failing with replicafetchthread OOM errors

2016-07-11 Thread feifei hsu
please refer (KAFKA-3933)
a workaround is -XX:MaxDirectMemorySize=1024m
if your callstack has direct buffer issues.(effectively off heap memory)

On Wed, May 11, 2016 at 9:50 AM, Russ Lavoie  wrote:

> Good Afternoon,
>
> I am currently trying to do a rolling upgrade from Kafka 0.8.2.1 to 0.9.0.1
> and am running into a problem when starting 0.9.0.1 with the protocol
> version 0.8.2.1 set in the server.properties.
>
> Here is my current Kafka topic setup, data retention and hardware used:
>
> 3 Zookeeper nodes
> 5 Broker nodes
> Topics have at least 2 replicas
> Topics have no more than 200 partitions
> 4,564 partitions across 61 topics
> 14 day retention
> Each Kafka node has between 2.1T - 2.9T of data
> Hardware is C4.2xlarge AWS instances
>  - 8 Core (Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz)
>  - 14G Ram
>  - 4TB EBS volume (10k IOPS [never gets maxed unless I up the
> num.io.threads])
>
> Here is my running broker configuration for 0.9.0.1:
> 
> [2016-05-11 11:43:58,172] INFO KafkaConfig values:
> advertised.host.name = server.domain
> metric.reporters = []
> quota.producer.default = 9223372036854775807
> offsets.topic.num.partitions = 150
> log.flush.interval.messages = 9223372036854775807
> auto.create.topics.enable = false
> controller.socket.timeout.ms = 3
> log.flush.interval.ms = 1000
> principal.builder.class = class
> org.apache.kafka.common.security.auth.DefaultPrincipalBuilder
> replica.socket.receive.buffer.bytes = 65536
> min.insync.replicas = 1
> replica.fetch.wait.max.ms = 500
> num.recovery.threads.per.data.dir = 1
> ssl.keystore.type = JKS
> default.replication.factor = 3
> ssl.truststore.password = null
> log.preallocate = false
> sasl.kerberos.principal.to.local.rules = [DEFAULT]
> fetch.purgatory.purge.interval.requests = 1000
> ssl.endpoint.identification.algorithm = null
> replica.socket.timeout.ms = 3
> message.max.bytes = 10485760
> num.io.threads =8
> offsets.commit.required.acks = -1
> log.flush.offset.checkpoint.interval.ms = 6
> delete.topic.enable = true
> quota.window.size.seconds = 1
> ssl.truststore.type = JKS
> offsets.commit.timeout.ms = 5000
> quota.window.num = 11
> zookeeper.connect = zkserver:2181/kafka
> authorizer.class.name =
> num.replica.fetchers = 8
> log.retention.ms = null
> log.roll.jitter.hours = 0
> log.cleaner.enable = false
> offsets.load.buffer.size = 5242880
> log.cleaner.delete.retention.ms = 8640
> ssl.client.auth = none
> controlled.shutdown.max.retries = 3
> queued.max.requests = 500
> offsets.topic.replication.factor = 3
> log.cleaner.threads = 1
> sasl.kerberos.service.name = null
> sasl.kerberos.ticket.renew.jitter = 0.05
> socket.request.max.bytes = 104857600
> ssl.trustmanager.algorithm = PKIX
> zookeeper.session.timeout.ms = 6000
> log.retention.bytes = -1
> sasl.kerberos.min.time.before.relogin = 6
> zookeeper.set.acl = false
> connections.max.idle.ms = 60
> offsets.retention.minutes = 1440
> replica.fetch.backoff.ms = 1000
> inter.broker.protocol.version = 0.8.2.1
> log.retention.hours = 168
> num.partitions = 16
> broker.id.generation.enable = false
> listeners = null
> ssl.provider = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> log.roll.ms = null
> log.flush.scheduler.interval.ms = 9223372036854775807
> ssl.cipher.suites = null
> log.index.size.max.bytes = 10485760
> ssl.keymanager.algorithm = SunX509
> security.inter.broker.protocol = PLAINTEXT
> replica.fetch.max.bytes = 104857600
> advertised.port = null
> log.cleaner.dedupe.buffer.size = 134217728
> replica.high.watermark.checkpoint.interval.ms = 5000
> log.cleaner.io.buffer.size = 524288
> sasl.kerberos.ticket.renew.window.factor = 0.8
> zookeeper.connection.timeout.ms = 6000
> controlled.shutdown.retry.backoff.ms = 5000
> log.roll.hours = 168
> log.cleanup.policy = delete
> host.name =
> log.roll.jitter.ms = null
> max.connections.per.ip = 2147483647
> offsets.topic.segment.bytes = 104857600
> background.threads = 10
> quota.consumer.default = 9223372036854775807
> request.timeout.ms = 3
> log.index.interval.bytes = 4096
> log.dir = /tmp/kafka-logs
> log.segment.bytes = 268435456
> log.cleaner.backoff.ms = 15000
> offset.metadata.max.bytes = 4096
> ssl.truststore.location = null
> group.max.session.timeout.ms = 3
> ssl.keystore.password = null
> zookeeper.sync.time.ms = 2000
> port = 9092
> log.retention.minutes = null
> log.segment.delete.delay.ms = 6
> log.dirs = /mnt/kafka/data
> controlled.shutdown.enable = true
> compression.type = producer
> max.connections.per.ip.overrides =
> sasl.kerberos.kinit.cmd = /usr/bin/kinit
> log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
> auto.leader.rebalance.enable = true
> leader.imbalance.check.interval.seconds = 300
> log.cleaner.min.cleanable.ratio = 0.5
> replica.lag.time.max.ms = 1
> num.network.threads =8
> ssl.key.password = null
> reserved.broker.max.id = 1000
> metrics.num.samples = 2
> socket.send.buffer.bytes = 2097152
> ssl.protocol = TLS

Re: Kafka on AWs EMR

2016-07-11 Thread Andrew Ehrlich
EMR is a product optimized for elastic scaling of clusters - using amazon's
pre-defined AMIs. Hadoop and Spark work well on this, but not so much
Kafka, which isn't so easy to automatically add/remove nodes from. The best
bet is still to use EC2 with your own Kafka AMIs or predefined third party
ones. Take a look at the AWS Cloudformation product for a handy way to
manage a cluster of EC2 images.

On Sun, Jul 10, 2016 at 7:49 PM, Snehalata Nagaje <
snehalata.nag...@harbingergroup.com> wrote:

>
>
>
> Though it has nothing to do map reduce, I was referingto EMR machines vs
> EC2 instance on amaozone?
>
> We an install other application on aws emr apart from what AWS provides,
> isn't it right?
>
> - Original Message -
> From: r...@chartbeat.com
> To: users@kafka.apache.org
> Sent: Sunday, July 10, 2016 9:25:30 PM
> Subject: Re: Kafka on AWs EMR
>
> Kafka is in no way related to map reduce, this doesn't really make any
> sense. You could have consumers running in Hadoop listening to Kafka topics
> - see the Samza project.
>
> > On Jul 10, 2016, at 11:16 AM, Snehalata Nagaje <
> snehalata.nag...@harbingergroup.com> wrote:
> >
> >
> >
> > Hi All,
> >
> >
> > I wanted to setup kafka on AWS EMR?
> >
> > But I could not find any documentation. Can you please help me on this?
> >
> > Thanks,
> > Snehalata
>


Re: Contribution : KafkaStreams CEP library

2016-07-11 Thread Guozhang Wang
Thanks Florian!


Do you mind if I link your repo in the Kafka Streams eco-system page?


Guozhang

On Mon, Jul 11, 2016 at 1:37 AM, Michael Noll  wrote:

> Thanks for sharing, Florian!
>
> -Michael
>
>
> On Fri, Jul 8, 2016 at 6:48 PM, Florian Hussonnois 
> wrote:
>
> > Hi All,
> >
> > Since a few weeks I'm working for fun on a CEP library on top of
> > KafkaStreams.
> > There is some progress and I think my project start to look something, or
> > at least I hope ;)
> >
> > https://github.com/fhussonnois/kafkastreams-cep
> >
> > So I'm pleased to share it with you (I already shared it with dev mailing
> > list but I just realised that I've forgotten to add the user list ^^ ).
> >
> > Currently, I'm looking to test my library against real use-cases. If some
> > of you test it please I would appreciate any feedback.
> >
> > Any contribution is welcome. I'm sure this project can be improved in
> many
> > ways.
> >
> > Thank in advance,
> >
> > --
> > Florian HUSSONNOIS
> >
>



-- 
-- Guozhang


Scheduling in Kafka?

2016-07-11 Thread Seongyun Ko
Hi, many queueing systems always have concerns on their scheduling policy (i.e. 
FCFS, PS)
I have two questions on it.

(1) What is the scheduling policy of Kafka in serving requests from producers 
and consumers?
(2) How many requests can be served simultaneously? (Is this related with 
num.io.threads, num.network.threads in configuration)?

// I understand that Kafka may not have huge concerns on scheduling policy 
thanks for the high speed publish/subscribe.

Scheduling in Kafka?

2016-07-11 Thread syko-dblab
Hi, many queueing systems always have concerns on their scheduling policy (i.e. 
FCFS, PS)
I have two questions on it.

(1) What is the scheduling policy of Kafka in serving requests from producers 
and consumers?
(2) How many requests can be served simultaneously? (Is this related with 
num.io.threads, num.network.threads in configuration)?

// I understand that Kafka may not have huge concerns on scheduling policy 
thanks for the high speed publish/subscribe.

whether compressed data is still compressed in broker log

2016-07-11 Thread xu
Hi All:
   I update broker version from 0.8.2 to 0.10.0 and set 
"compression.type=snappy" in server.properties. Version of  producers and 
consumers is still 0.8.2.  I expect all the new data received by brokers is 
stored compressedly in log files. But the result is in contrast。


 My question is the same with: 
http://stackoverflow.com/questions/31168085/checking-kafka-data-if-compressed.


I donot understand how compressed data is stored in log files. Could you 
help me?

Re: Kafka Streams : Old Producer

2016-07-11 Thread Vivek
Thanks a lot Micheal.
I used WallClockTimeStampExtractor for now.

Thanks,
Vivek

> On Jul 8, 2016, at 1:25 AM, Michael Noll  wrote:
> 
> Vivek,
> 
> in this case you should manually embed a timestamp within the payload of
> the produced messages (e.g. as a Long field in an Avro-encoded message
> value).  This would need to be done by the producer.
> 
> Then, in Kafka Streams, you'd need to implement a custom
> TimestampExtractor that can retrieve this timestamp from the message
> payload. And you need to configure your StreamsConfig to use that custom
> timestamp.
> 
> Hope this helps,
> Michael
> 
> 
> 
>> On Thursday, July 7, 2016, vivek thakre  wrote:
>> 
>> Thats right Ismael, I am looking for work arounds either on 0.9.0.1
>> Producer side or on the Kafka Streams side so that I can process messages
>> produced by 0.9.0.1 producer using Kafka Streams Library.
>> 
>> Thanks,
>> Vivek
>> 
>> On Thu, Jul 7, 2016 at 9:05 AM, Ismael Juma > > wrote:
>> 
>>> Hi,
>>> 
>>> Matthias, I think Vivek's question is not whether Kafka Streams can be
>> used
>>> with a Kafka 0.9 broker (which it cannot). The question is whether Kafka
>>> Streams can process messages produced with a 0.9.0.1 producer into a
>>> 0.10.0.0 broker. Is that right? If so, would a custom TimestampExtractor
>>> work?
>>> 
>>> Ismael
>>> 
>>> On Thu, Jul 7, 2016 at 12:29 PM, Matthias J. Sax > >
>>> wrote:
>>> 
 Hi Vivek,
 
 Kafka Streams works only with Kafka 0.10 (but not with 0.9).
 
 I am not aware of any work around to allow for 0.9 usage.
 
 
 -Matthias
 
> On 07/07/2016 05:37 AM, vivek thakre wrote:
> Can kafka streams library work with the messages produced by 0.9.0.1
> producer?
> I guess not since the old producer would not add timestamp. ( I am
 getting
> invalid timestamp exception)
> 
> As I cannot change our producer application setup, I have to use
>>> 0.9.0.1
> producer.
> Is there a workaround that I can try to use Kafka Streams?
> 
> Thanks,
> Vivek
> 
> 
> -- 
> Best regards,
> Michael Noll
> 
> 
> 
> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
> Apache Kafka and Confluent Platform: www.confluent.io/download
> *


Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-11 Thread Gerard Klijs
You could set the auto.commit.interval.ms to a lower value, in your example
it is 10 seconds, which can be a lot of messages. I don't really see how it
could be prevented any further, since offset's can only committed by
consumer to the partitions they are assigned to. I do believe there is some
work in progress in which the assigned of partitions to consumers is
somewhat sticky.
In that case when a consumer has been assigned the same partitions after
the rebalance as it has had before, and then it should not be necessary to
consume the same data again in those partitions.

On Mon, Jul 11, 2016 at 3:18 PM Michael Luban  wrote:

> Using the 0.8.2.1 client.
>
> Is it possible to statistically minimize the possibility of duplication in
> this scenario or has this behavior been corrected in a later client
> version?  Or is the test flawed?
>
> https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0
>


Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-11 Thread Michael Luban
Using the 0.8.2.1 client.

Is it possible to statistically minimize the possibility of duplication in
this scenario or has this behavior been corrected in a later client
version?  Or is the test flawed?

https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0


Re: Offset Tools

2016-07-11 Thread Jörg Wagner

Hello Jan,

Thanks for the information and also offer. I might come back to that 
some day, but for now we seem to have located the issue:
the consumer was noch consuming as expected. Hence the consumer group is 
not yet known to kafka and can not be seen.


Cheers
Jörg

On 11.07.2016 14:07, Jan Omar wrote:

Hi Joerg,

We recently built a tool for fetching current offsets (per partition) for a 
given consumer group. Also for setting the group to a specific offset.
It even allows to reset a consumer group to a given timestamp by running bisect 
(binary search) on the messages.

Unfortunately we're using a proprietary message format, that's why we don't 
have any plans (or capacity) to open source it at the moment.

However builiding that tool was straight forward, it shouldn't take you more 
than a day or two to build something similar. Ping me if you need some help.

Regards

Jan


On 11 Jul 2016, at 13:00, Jörg Wagner  wrote:

Hello!

We recently switched to Kafka 0.9.0.1 and currently I don't seem to be able to 
figure out how to read the consumer offsets via cli. We are using the 0.9.0.1 
new consumer and are storing the offsets in kafka.

Status:
kafka-consumer-offset-checker.sh is old and deprecated, points to 
kafka-consumer-groups.sh
kafka-consumer-groups.sh in old consumer mode shows lots of data, but unknown 
for offsets and lag
kafka-consumer-groups.sh in new consumer mode shows only one single consumer 
group, not the one(s) expected. The one available though shows all the correct 
data I would like to see for a group not shown.

Short: since 0.9.0.1 I can't see the consumer offsets anymore.

Anyone else experiencing this?

Cheers
Jörg


--
Mit freundlichem Gruß

Jörg Wagner

Systemadministrator
Search & Account Security

1&1 Mail & Media Development  & Technology GmbH | Sapporobogen 6-8 | 80637 
München | Germany
Phone: +49 89 14339 324
E-Mail: joerg.wagn...@1und1.de | Web: www.1und1.de

Amtsgericht Montabaur, HRB 5452

Geschäftsführer: Frank Einhellinger, Thomas Ludwig, Jan Oetjen


Member of United Internet



Re: Offset Tools

2016-07-11 Thread Jan Omar
Hi Joerg,

We recently built a tool for fetching current offsets (per partition) for a 
given consumer group. Also for setting the group to a specific offset.
It even allows to reset a consumer group to a given timestamp by running bisect 
(binary search) on the messages. 

Unfortunately we're using a proprietary message format, that's why we don't 
have any plans (or capacity) to open source it at the moment.

However builiding that tool was straight forward, it shouldn't take you more 
than a day or two to build something similar. Ping me if you need some help.

Regards

Jan 

> On 11 Jul 2016, at 13:00, Jörg Wagner  wrote:
> 
> Hello!
> 
> We recently switched to Kafka 0.9.0.1 and currently I don't seem to be able 
> to figure out how to read the consumer offsets via cli. We are using the 
> 0.9.0.1 new consumer and are storing the offsets in kafka.
> 
> Status:
> kafka-consumer-offset-checker.sh is old and deprecated, points to 
> kafka-consumer-groups.sh
> kafka-consumer-groups.sh in old consumer mode shows lots of data, but unknown 
> for offsets and lag
> kafka-consumer-groups.sh in new consumer mode shows only one single consumer 
> group, not the one(s) expected. The one available though shows all the 
> correct data I would like to see for a group not shown.
> 
> Short: since 0.9.0.1 I can't see the consumer offsets anymore.
> 
> Anyone else experiencing this?
> 
> Cheers
> Jörg



Offset Tools

2016-07-11 Thread Jörg Wagner

Hello!

We recently switched to Kafka 0.9.0.1 and currently I don't seem to be 
able to figure out how to read the consumer offsets via cli. We are 
using the 0.9.0.1 new consumer and are storing the offsets in kafka.


Status:
kafka-consumer-offset-checker.sh is old and deprecated, points to 
kafka-consumer-groups.sh
kafka-consumer-groups.sh in old consumer mode shows lots of data, but 
unknown for offsets and lag
kafka-consumer-groups.sh in new consumer mode shows only one single 
consumer group, not the one(s) expected. The one available though shows 
all the correct data I would like to see for a group not shown.


Short: since 0.9.0.1 I can't see the consumer offsets anymore.

Anyone else experiencing this?

Cheers
Jörg


Kafka 0.9 API connection refused exception not raised to client code

2016-07-11 Thread Hari Sekhon
Hi kafka folks,

I've written some code to test Kafka brokers using the Kafka 0.9 API (I
need Kerberos support).

It appears the 0.9 API is trapping broker connection refused exception at a
lower level and retrying in a loop (visible only at debug log level), when
I actually want it to fail and raise the exception to my client code so I
can report the result.

Is there any way to get the Kafka 0.9 API to preferably raise the broker
connection refused exception straight away up to my client code or to set a
short timeout for how long it retries the conection in order to force the
tester to fail fast?

Thanks

Hari


Re: Contribution : KafkaStreams CEP library

2016-07-11 Thread Michael Noll
Thanks for sharing, Florian!

-Michael


On Fri, Jul 8, 2016 at 6:48 PM, Florian Hussonnois 
wrote:

> Hi All,
>
> Since a few weeks I'm working for fun on a CEP library on top of
> KafkaStreams.
> There is some progress and I think my project start to look something, or
> at least I hope ;)
>
> https://github.com/fhussonnois/kafkastreams-cep
>
> So I'm pleased to share it with you (I already shared it with dev mailing
> list but I just realised that I've forgotten to add the user list ^^ ).
>
> Currently, I'm looking to test my library against real use-cases. If some
> of you test it please I would appreciate any feedback.
>
> Any contribution is welcome. I'm sure this project can be improved in many
> ways.
>
> Thank in advance,
>
> --
> Florian HUSSONNOIS
>


Re: Kafka - offset preservation

2016-07-11 Thread R Krishna
Which version is this? I am a new user myself, are you using the same
consumer group name? In V0.9, once connected, broker stores consumer
offsets per consumer group, and continues from where it left off, the
earliest/latest are for the first time only.

On Wed, Jul 6, 2016 at 6:19 AM, Pawel Huszcza 
wrote:

> Hello,
>
> I tried every different property I can think of - I have set
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG = true;
> ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG = 1000, but I have also tried
> with many different values
> And still every time I start my consumer - I am receiving all the messages
> ever sent to the given topic - it is every time the same consumer group;
> ConsumerConfig.AUTO_OFFSET_RESET_CONFIG = "latest" but I also tried with
> "earliest" (just for luck). What is the proper configuration to get it to
> work in such a way, that every time I start my consumer - it will process
> only the new messages not all of them?
>
> Kind Regards,
> Pawel Jan Huszcza
>



-- 
Radha Krishna, Proddaturi
253-234-5657