Re: how to manage if zookeeper goes down

2017-01-12 Thread Prahalad kothwal
You  should configure zookeeper quorum for zookeeper high availability,
basically you will have multiple zookeeper services running on multiple
nodes when one goes down other takes over.

Thanks,
Prahalad

On Fri, Jan 13, 2017 at 9:56 AM, Laxmi Narayan NIT DGP  wrote:

> Hi,
>
> I have understood that zookeeper is responsible for runing kafka in cluster
> mode.
>
> But how do we ensure that zookeeper never goes down?
>
> *​​*
>
> *Regards,*
> *Laxmi Narayan Patel*
> *MCA NIT Durgapur (2011-2014)*
> *Mob:-9741292048,8345847473*
>


how to manage if zookeeper goes down

2017-01-12 Thread Laxmi Narayan NIT DGP
Hi,

I have understood that zookeeper is responsible for runing kafka in cluster
mode.

But how do we ensure that zookeeper never goes down?

*​​*

*Regards,*
*Laxmi Narayan Patel*
*MCA NIT Durgapur (2011-2014)*
*Mob:-9741292048,8345847473*


Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephane Maarek
Hi Stephen

Out of curiosity, why did you pick ZFS over XFS or ext4 and what options
are you using when formatting and mounting?

Regards,
Stephane

On 13 January 2017 at 6:40:18 am, Stephen Powis (spo...@salesforce.com)
wrote:

Running Centos 6.7 3.10.95-1.el6.elrepo.x86_64. 4 SATA disks in RAID10
with ZFS

On Thu, Jan 12, 2017 at 2:27 PM, Tauzell, Dave  wrote:

> You have a local filesystem? Linux?
>
> -Dave
>
> -Original Message-
> From: Stephen Powis [mailto:spo...@salesforce.com]
> Sent: Thursday, January 12, 2017 1:22 PM
> To: users@kafka.apache.org
> Subject: Re: Taking a long time to roll a new log segment (~1 min)
>
> I've further narrowed it down to this particular line:
> https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/
> OffsetIndex.scala#L294
>
> But I'm still at a loss to why this would be slow sometimes and not
others.
>
> On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis 
> wrote:
>
> > Using the little bash script in that JIRA ticket to go thru the GC log
> > and sum up the total pause times I come up with the following. I
> > don't see anything that would indicate a ~28 second pause.
> >
> > 2017-01-12T07:00 = 72.2961
> >> 2017-01-12T07:01 = 59.8112
> >> 2017-01-12T07:02 = 69.6915
> >> 2017-01-12T07:03 = 74.4093
> >> 2017-01-12T07:04 = 73.507
> >> 2017-01-12T07:05 = 73.7028
> >> 2017-01-12T07:06 = 72.3392
> >> 2017-01-12T07:07 = 70.502
> >> 2017-01-12T07:08 = 71.7471
> >> 2017-01-12T07:09 = 76.8371
> >> 2017-01-12T07:10 = 72.3836
> >> 2017-01-12T07:11 = 74.7462
> >> 2017-01-12T07:12 = 57.1328
> >> 2017-01-12T07:13 = 80.005
> >> 2017-01-12T07:14 = 76.0012
> >> 2017-01-12T07:15 = 79.2221
> >> 2017-01-12T07:16 = 57.5385
> >> 2017-01-12T07:17 = 58.1185
> >> 2017-01-12T07:18 = 72.0524
> >> 2017-01-12T07:19 = 110.6736
> >> 2017-01-12T07:20 = 64.9808
> >> 2017-01-12T07:21 = 66.8767
> >> 2017-01-12T07:22 = 32.409
> >> 2017-01-12T07:23 = 85.3671
> >> 2017-01-12T07:24 = 123.9212
> >> 2017-01-12T07:25 = 70.3219
> >>
> >
> >
> >
> >
> >
> > On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis
> > 
> > wrote:
> >
> >> Just realized that GCEasy doesn't keep reports around for very long
> >> anymore, here is a screencap of the report: http://imgur.com/a/MEubD
> >>
> >> The longest reported GC pause was 30ms, tho they happen somewhat
> >> frequently at an average of once per 12 seconds. KAFKA-4616
> >> certainly sounds just like my scenario, but I'm having trouble
> >> finding the long GC pauses described in that ticket in my own logs.
> >>
> >> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao 
> wrote:
> >>
> >>> You may run into this bug https://issues.apache.org/jira
> >>> /browse/KAFKA-4614
> >>>
> >>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis 
> >>> wrote:
> >>>
> >>> > Per my email to the list in Sept, when I reviewed GC logs then, I
> >>> didn't
> >>> > see anything out of the ordinary. (
> >>> >
> >>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
> >>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
> >>> 40mail.gmail.com%3E
> >>> > )
> >>> >
> >>> > Reviewing the GC logs from this morning around the timestamp in my
> >>> previous
> >>> > email, I see GC running roughly every 10-12 seconds, with total
> >>> > times similar to the following:
> >>> >
> >>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
> >>> application
> >>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
> >>> 0.0002171
> >>> > > seconds
> >>> > >
> >>> > >
> >>> > Here is a link to a GCEasy report:
> >>> >
> >>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
> >>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
> >>> >
> >>> >
> >>> > Currently using G1 gc with the following settings:
> >>> >
> >>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
> >>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps
> >>> -XX:+PrintGCDetails
> >>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> >>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
> >>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
> >>> > -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC
> >>> > -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError
> >>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
> >>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log
> >>> > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >>> > -XX:+PrintGCTimeStamps
> >>> >
> >>> >
> >>> >
> >>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
> >>> > dave.tauz...@surescripts.com
> >>> > > wrote:
> >>> >
> >>> > > Can you collect garbage collection stats and verify there isn't
> >>> > > a
> >>> long GC
> >>> > > happening at the same time?
> >>> > >
> >>> > > -Dave
> >>> > >
> >>> > > -Original Message-
> >>> > > From: Stephen Powis [mailto:spo...@salesforce.com]
> >>> > > Sent: 

Re: java.lang.OutOfMemoryError: Java heap space while running kafka-consumer-perf-test.sh

2017-01-12 Thread Check Peck
Can anyone shed some light on this?

On Wed, Jan 11, 2017 at 2:59 PM, Check Peck  wrote:

> I am trying to run kafka performance script on my linux box. Whenever I
> run "kafka-consumer-perf-test.sh", I always get an error. In the same box,
> I am running "kafka-producer-perf-test.sh" as well and that is not failing
> at all. Looks like something is wrong with "kafka-consumer-perf-test.sh".
>
> I am running Kafka version 0.10.1.0.
>
> Command I ran:
> ./bin/kafka-consumer-perf-test.sh --zookeeper 110.27.14.10:2181
> --messages 50 --topic test-topic --threads 1
>
> Error I got:
> [2017-01-11 22:34:09,785] WARN [ConsumerFetcherThread-perf-co
> nsumer-14195_kafka-cluster-3098529006 <(309)%20852-9006>
> -zeidk-1484174043509-46a51434-2-0], Error in fetch
> kafka.consumer.ConsumerFetcherThread$FetchRequest@54fb48b6
> (kafka.consumer.ConsumerFetcherThread)
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
> at org.apache.kafka.common.network.NetworkReceive.readFromReada
> bleChannel(NetworkReceive.java:93)
> at kafka.network.BlockingChannel.readCompletely(BlockingChannel
> .scala:129)
> at kafka.network.BlockingChannel.receive(BlockingChannel.scala:
> 120)
> at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.s
> cala:99)
> at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$
> $sendRequest(SimpleConsumer.scala:83)
> at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$appl
> y$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:132)
> at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$appl
> y$mcV$sp$1.apply(SimpleConsumer.scala:132)
> at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$appl
> y$mcV$sp$1.apply(SimpleConsumer.scala:132)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(
> SimpleConsumer.scala:131)
> at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleC
> onsumer.scala:131)
> at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleC
> onsumer.scala:131)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:130)
> at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherTh
> read.scala:109)
> at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherTh
> read.scala:29)
> at kafka.server.AbstractFetcherThread.processFetchRequest(Abstr
> actFetcherThread.scala:118)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThr
> ead.scala:103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
>


Kafka Streams: from a KStream, aggregating records with the same key and updated metrics ?

2017-01-12 Thread Nicolas Fouché
Hi,

long long technical story, sorry for that.

I'm dealing with a special case. My input topic receives records containing
an id in the key (and another field for partitioning), and a version number
in the value, amongst other metrics. Records with the same id are sent
every 5 seconds, and the version number increments.

These metrics in the record value are used in aggregations to compute
`sums` and `counts` (then stored in a DB to compute averages), and to
compute a few other data structures like cumulative time buckets. If the
aggregation receives the same record with updated metrics, I have to
decrement `sum` by the metric value of the previous record, and increment
`sum` by the new metric value. Also, the `count` would be incremented by 1
only if the record is seen for the first time (which is not the same as
"version number = 1").

To implement this, we would write a processor which would compute the diff
of metrics by storing the last version of each record in its state. This
diff is sent to the aggregation, this diff also tells if the record was the
first (so `count` is incremented). I think this can only written with the
low level API.
That could work well, except we have a dozen type of records, with a few
metrics each, and quite a few fields to compute in aggregations. Each time
we deal with this type of "duplicate" records, we would have to write all
the code to compute the diffs again, and the aggregation algorithm becomes
way less trivial (we deal with cumulative time buckets, if one knows what I
mean).

So we got another idea, which does not seem to feel right in a *streaming*
environment, and quite inefficient:


The goal is to "buffer" records until we're quite sure no new version will
be received. And if a new version is actually received, it's ignored.
A generic low level processor would be used in topologies which receive the
same records with updated metrics and an incremented version.

One state store: contains the records, used to know if a record was already
received and when, and if the record was already transferred.

Algorithm:

On each new record:
- GET the previous record in the store by Key
- ignore the new record if:
-- the record version is lower than the one in the store
-- the record timestamp is at least 5 minutes newer than the one in store
- PUT (and thus replace) the record in the store

Every 1 minute:
- for each record in the store
-- if the record has the field "forwarded == true"
--- DELETE it from the store if the record is one hour old
-- else
--- if the timestamp is more that 5 minutes old
 PUT the record in the store with the field "forwarded" set to true
 forward the record
===

Caveats:
- low-level processors don't have access to the record's ingestion
timestamp. So we would have to add it to the record value before producing
the record.
- no secondary indexes, so we do complete iterations on each `ponctuate`
- it feels so wrong

Any suggestions ? It feels like a KStream of KTable records...

Thanks.


Re: Kafka Streams: consume 6 months old data VS windows maintain durations

2017-01-12 Thread Nicolas Fouché
Thanks Eno !

My intention is to reprocess all the data from the beginning. And we'll
reset the application as documented in the Confluent blog.
We don't want to keep the previous results; in fact, we want to overwrite
them. Kafka Connect will happily replace all records in our sink database.

So I'll reset the streams app, them change the window duration times to 6
months until the application processes fresh messages, and then I'll
restart the application with the original window duration time (without a
reset this time). Let's hope Kafka Streams will detect this window duration
change and drop old windows immediately ?


2017-01-12 17:06 GMT+01:00 Eno Thereska :

> Hi Nicolas,
>
> I've seen your previous message thread too. I think your best bet for now
> is to increase the window duration time, to 6 months.
>
> If you change your application logic, e.g., by changing the duration time,
> the semantics of the change wouldn't immediate be clear and it's worth
> clarifying those. For example, would the intention be to reprocess all the
> data from the beginning? Or start where you left off (in which case the
> fact that the original processing went over data that is 6 month old would
> not be relevant, since you'd start from where you left off the second
> time)? Right now we support a limited way to reprocess the data by
> effectively resetting a streams application (https://www.confluent.io/
> blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/
>  kafka-streams-resetting-a-streams-application/>). I wouldn't recommend
> using that if you want to keep the results of the previous run though.
>
> Eno
>
> > On 12 Jan 2017, at 09:15, Nicolas Fouché  wrote:
> >
> > Hi.
> >
> >
> > I'd like to re-consume 6 months old data with Kafka Streams.
> >
> > My current topology can't because it defines aggregations with windows
> maintain durations of 3 days.
> > TimeWindows.of(ONE_HOUR_MILLIS).until(THREE_DAYS_MILLIS)
> >
> >
> >
> > As discovered (and shared [1]) a few months ago, consuming a record
> older than 3 days will mess up my aggregates. How do you deal with this ?
> Do you temporarily raise the windows maintain durations until all records
> are consumed ? Do you always run your topologies with long durations, like
> a year ? I have no idea what would be the impact on the RAM and disk, but I
> guess RocksDB would cry a little.
> >
> >
> > Final question: il I raise the duration to 6 months, consume my records,
> and then set the duration back to 3 days, would the old aggregates
> automatically destroyed ?
> >
> >
> > [1] http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/%
> 3ccabqkjkj42n7z4bxjdkrdyz_kmpunh738uxvm7gy24dnkx+r...@mail.gmail.com%3e
> >
> >
> > Thanks
> > Nicolas
> >
> >
>
>


UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123


-Original Message-
From: Gerard Klijs [mailto:gerard.kl...@dizzit.com] 
Sent: Wednesday, May 11, 2016 3:00 AM
To: users@kafka.apache.org
Subject: Re: Backing up Kafka data and using it later?

You could create a docker image with a kafka installation, and start a mirror 
maker in it, you could set the retention time for it to infinite, and mount the 
data volume. With the data you could always restart the docker, en mirror it to 
somewhere else. Not sure that's what you want, but it's an option to save data 
for use some other place/time.

On Wed, May 11, 2016 at 12:33 AM Alex Loddengaard  wrote:

> You may find this interesting, although I don't believe it's exactly 
> what you're looking for:
>
> https://github.com/pinterest/secor
>
> I'm not sure how stable and commonly used it is.
>
> Additionally, I see a lot of users use MirrorMaker for a "backup," 
> where MirrorMaker copies all topics from one Kafka cluster to another "backup"
> cluster. I put "backup" in quotes because this architecture doesn't 
> support snapshotting like a traditional backup would. I realize this 
> doesn't address your specific use case, but thought you may find it 
> interesting regardless.
>
> Sorry I'm a little late to the thread, too.
>
> Alex
>
> On Thu, May 5, 2016 at 7:05 AM, Rad Gruchalski 
> wrote:
>
> > John,
> >
> > I’m not as expert expert in Kafka but I would assume so.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:
> > ra...@gruchalski.com)
> > de.linkedin.com/in/radgruchalski/ (
> > http://de.linkedin.com/in/radgruchalski/)
> >
> > Confidentiality:
> > This communication is intended for the above-named person and may be 
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, 
> > nor must you copy or show it to anyone; please delete/destroy and 
> > inform the sender immediately.
> >
> >
> >
> > On Thursday, 5 May 2016 at 01:46, John Bickerstaff wrote:
> >
> > > Thanks - does that mean that the only way to safely back up Kafka 
> > > is to have replication?
> > >
> > > (I have done this partially - I can get the entire topic on the 
> > > command line, after completely recreating the server, but my code 
> > > that is
> > intended
> > > to do the same thing just hangs)
> > >
> > > On Wed, May 4, 2016 at 3:18 PM, Rad Gruchalski 
> > >  > (mailto:ra...@gruchalski.com)> wrote:
> > >
> > > > John,
> > > >
> > > > I believe you mean something along the lines of:
> > > > http://markmail.org/message/f7xb5okr3ujkplk4
> > > > I don’t think something like this has been done.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Best regards,
> > > > Radek Gruchalski
> > > > ra...@gruchalski.com (mailto:ra...@gruchalski.com) (mailto:
> > > > ra...@gruchalski.com (mailto:ra...@gruchalski.com)) 
> > > > de.linkedin.com/in/radgruchalski/ (
> > http://de.linkedin.com/in/radgruchalski/) (
> > > > http://de.linkedin.com/in/radgruchalski/)
> > > >
> > > > Confidentiality:
> > > > This communication is intended for the above-named person and 
> > > > may be confidential and/or legally privileged.
> > > > If it has come to you in error you must take no action based on 
> > > > it,
> nor
> > > > must you copy or show it to anyone; please delete/destroy and 
> > > > inform
> > the
> > > > sender immediately.
> > > >
> > > >
> > > >
> > > > On Wednesday, 4 May 2016 at 23:04, John Bickerstaff wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have what is probably an edge use case. I'd like to back up 
> > > > > a
> > single
> > > > > Kafka instance such that I can recreate a new server, drop 
> > > > > Kafka
> in,
> > drop
> > > > > the data in, start Kafka -- and have all my data ready to go 
> > > > > again
> > for
> > > > > consumers.
> > > > >
> > > > > Is such a thing done? Does anyone have any experience trying this?
> > > > >
> > > > > I have, and I've run into some problems which suggest there's 
> > > > > a
> > setting
> > > > or
> > > > > some other thing I'm unaware of...
> > > > >
> > > > > If you like, don't think of it as a backup problem so much as 
> > > > > a
> > "cloning"
> > > > > problem. I want to clone a new Kafka machine without actually
> > cloning it
> > > > >
> > > >
> > > > -
> > > > > I.E. the data is somewhere else (log and index files) although
> > Zookeeper
> > > >
> > > > is
> > > > > up and running just fine.
> > > > >
> > > > > Thanks
> >
> >
>



Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Running Centos 6.7 3.10.95-1.el6.elrepo.x86_64.  4 SATA disks in RAID10
with ZFS

On Thu, Jan 12, 2017 at 2:27 PM, Tauzell, Dave  wrote:

> You have a local filesystem? Linux?
>
> -Dave
>
> -Original Message-
> From: Stephen Powis [mailto:spo...@salesforce.com]
> Sent: Thursday, January 12, 2017 1:22 PM
> To: users@kafka.apache.org
> Subject: Re: Taking a long time to roll a new log segment (~1 min)
>
> I've further narrowed it down to this particular line:
> https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/
> OffsetIndex.scala#L294
>
> But I'm still at a loss to why this would be slow sometimes and not others.
>
> On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis 
> wrote:
>
> > Using the little bash script in that JIRA ticket to go thru the GC log
> > and sum up the total pause times I come up with the following.  I
> > don't see anything that would indicate a ~28 second pause.
> >
> > 2017-01-12T07:00 = 72.2961
> >> 2017-01-12T07:01 = 59.8112
> >> 2017-01-12T07:02 = 69.6915
> >> 2017-01-12T07:03 = 74.4093
> >> 2017-01-12T07:04 = 73.507
> >> 2017-01-12T07:05 = 73.7028
> >> 2017-01-12T07:06 = 72.3392
> >> 2017-01-12T07:07 = 70.502
> >> 2017-01-12T07:08 = 71.7471
> >> 2017-01-12T07:09 = 76.8371
> >> 2017-01-12T07:10 = 72.3836
> >> 2017-01-12T07:11 = 74.7462
> >> 2017-01-12T07:12 = 57.1328
> >> 2017-01-12T07:13 = 80.005
> >> 2017-01-12T07:14 = 76.0012
> >> 2017-01-12T07:15 = 79.2221
> >> 2017-01-12T07:16 = 57.5385
> >> 2017-01-12T07:17 = 58.1185
> >> 2017-01-12T07:18 = 72.0524
> >> 2017-01-12T07:19 = 110.6736
> >> 2017-01-12T07:20 = 64.9808
> >> 2017-01-12T07:21 = 66.8767
> >> 2017-01-12T07:22 = 32.409
> >> 2017-01-12T07:23 = 85.3671
> >> 2017-01-12T07:24 = 123.9212
> >> 2017-01-12T07:25 = 70.3219
> >>
> >
> >
> >
> >
> >
> > On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis
> > 
> > wrote:
> >
> >> Just realized that GCEasy doesn't keep reports around for very long
> >> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
> >>
> >> The longest reported GC pause was 30ms, tho they happen somewhat
> >> frequently at an average of once per 12 seconds.  KAFKA-4616
> >> certainly sounds just like my scenario, but I'm having trouble
> >> finding the long GC pauses described in that ticket in my own logs.
> >>
> >> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao 
> wrote:
> >>
> >>> You may run into this bug https://issues.apache.org/jira
> >>> /browse/KAFKA-4614
> >>>
> >>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis 
> >>> wrote:
> >>>
> >>> > Per my email to the list in Sept, when I reviewed GC logs then, I
> >>> didn't
> >>> > see anything out of the ordinary. (
> >>> >
> >>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
> >>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
> >>> 40mail.gmail.com%3E
> >>> > )
> >>> >
> >>> > Reviewing the GC logs from this morning around the timestamp in my
> >>> previous
> >>> > email, I see GC running roughly every 10-12 seconds, with total
> >>> > times similar to the following:
> >>> >
> >>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
> >>> application
> >>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
> >>> 0.0002171
> >>> > > seconds
> >>> > >
> >>> > >
> >>> > Here is a link to a GCEasy report:
> >>> >
> >>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
> >>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
> >>> >
> >>> >
> >>> > Currently using G1 gc with the following settings:
> >>> >
> >>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
> >>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps
> >>> -XX:+PrintGCDetails
> >>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> >>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
> >>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
> >>> > -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC
> >>> > -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError
> >>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
> >>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log
> >>> > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >>> > -XX:+PrintGCTimeStamps
> >>> >
> >>> >
> >>> >
> >>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
> >>> > dave.tauz...@surescripts.com
> >>> > > wrote:
> >>> >
> >>> > > Can you collect garbage collection stats and verify there isn't
> >>> > > a
> >>> long GC
> >>> > > happening at the same time?
> >>> > >
> >>> > > -Dave
> >>> > >
> >>> > > -Original Message-
> >>> > > From: Stephen Powis [mailto:spo...@salesforce.com]
> >>> > > Sent: Thursday, January 12, 2017 8:34 AM
> >>> > > To: users@kafka.apache.org
> >>> > > Subject: Re: Taking a long time to roll a new log segment (~1
> >>> > > min)
> >>> > >
> >>> > > So per the kafka docs I up'd our FD limit to 100k, 

Error finding consumer coordinators after restart

2017-01-12 Thread Dave Hamilton
Hello, we ran into a memory issue on a Kafka 0.10.0.1 broker we are running 
that required a system restart. Since bringing Kafka back up it seems the 
consumers are having issues finding their coordinators. Here are some errors 
we’ve seen in our server logs after restarting:

[2017-01-12 19:02:10,178] ERROR [Group Metadata Manager on Broker 0]: Error in 
loading offsets from [__consumer_offsets,40] 
(kafka.coordinator.GroupMetadataManager)
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:678)
at kafka.log.FileMessageSet.searchFor(FileMessageSet.scala:135)
at kafka.log.LogSegment.translateOffset(LogSegment.scala:106)
at kafka.log.LogSegment.read(LogSegment.scala:127)
at kafka.log.Log.read(Log.scala:532)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply$mcV$sp(GroupMetadataManager.scala:380)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1$1.apply(GroupMetadataManager.scala:374)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at kafka.utils.CoreUtils$.inWriteLock(CoreUtils.scala:239)
at 
kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$loadGroupsAndOffsets$1(GroupMetadataManager.scala:374)
at 
kafka.coordinator.GroupMetadataManager$$anonfun$loadGroupsForPartition$1.apply$mcV$sp(GroupMetadataManager.scala:353)
at 
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:110)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:56)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2017-01-12 19:03:56,468] ERROR [KafkaApi-0] Error when handling request 
{topics=[__consumer_offsets]} (kafka.server.KafkaApis)
kafka.admin.AdminOperationException: replication factor: 1 larger than 
available brokers: 0
at 
kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:403)
at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$createTopic(KafkaApis.scala:629)
at 
kafka.server.KafkaApis.kafka$server$KafkaApis$$createGroupMetadataTopic(KafkaApis.scala:651)
at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:668)
at kafka.server.KafkaApis$$anonfun$29.apply(KafkaApis.scala:666)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at 
scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
at scala.collection.SetLike$class.map(SetLike.scala:92)
at scala.collection.AbstractSet.map(Set.scala:47)
at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:666)
at 
kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:727)
at kafka.server.KafkaApis.handle(KafkaApis.scala:79)
at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:744)

Also running the kafka-consumer-groups.sh on a consumer group returns the 
following:

Error while executing consumer group command This is not the correct 
coordinator for this group.
org.apache.kafka.common.errors.NotCoordinatorForGroupException: This is not the 
correct coordinator for this group.

We also see the following logs when trying to restart a Kafka connector:

[2017-01-12 17:44:07,941] INFO Discovered coordinator 
lxskfkdal501.nanigans.com:9092 (id: 2147483647 rack: null) for 

RE: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Tauzell, Dave
You have a local filesystem? Linux?

-Dave

-Original Message-
From: Stephen Powis [mailto:spo...@salesforce.com]
Sent: Thursday, January 12, 2017 1:22 PM
To: users@kafka.apache.org
Subject: Re: Taking a long time to roll a new log segment (~1 min)

I've further narrowed it down to this particular line:
https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/OffsetIndex.scala#L294

But I'm still at a loss to why this would be slow sometimes and not others.

On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis 
wrote:

> Using the little bash script in that JIRA ticket to go thru the GC log
> and sum up the total pause times I come up with the following.  I
> don't see anything that would indicate a ~28 second pause.
>
> 2017-01-12T07:00 = 72.2961
>> 2017-01-12T07:01 = 59.8112
>> 2017-01-12T07:02 = 69.6915
>> 2017-01-12T07:03 = 74.4093
>> 2017-01-12T07:04 = 73.507
>> 2017-01-12T07:05 = 73.7028
>> 2017-01-12T07:06 = 72.3392
>> 2017-01-12T07:07 = 70.502
>> 2017-01-12T07:08 = 71.7471
>> 2017-01-12T07:09 = 76.8371
>> 2017-01-12T07:10 = 72.3836
>> 2017-01-12T07:11 = 74.7462
>> 2017-01-12T07:12 = 57.1328
>> 2017-01-12T07:13 = 80.005
>> 2017-01-12T07:14 = 76.0012
>> 2017-01-12T07:15 = 79.2221
>> 2017-01-12T07:16 = 57.5385
>> 2017-01-12T07:17 = 58.1185
>> 2017-01-12T07:18 = 72.0524
>> 2017-01-12T07:19 = 110.6736
>> 2017-01-12T07:20 = 64.9808
>> 2017-01-12T07:21 = 66.8767
>> 2017-01-12T07:22 = 32.409
>> 2017-01-12T07:23 = 85.3671
>> 2017-01-12T07:24 = 123.9212
>> 2017-01-12T07:25 = 70.3219
>>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis
> 
> wrote:
>
>> Just realized that GCEasy doesn't keep reports around for very long
>> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
>>
>> The longest reported GC pause was 30ms, tho they happen somewhat
>> frequently at an average of once per 12 seconds.  KAFKA-4616
>> certainly sounds just like my scenario, but I'm having trouble
>> finding the long GC pauses described in that ticket in my own logs.
>>
>> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao  wrote:
>>
>>> You may run into this bug https://issues.apache.org/jira
>>> /browse/KAFKA-4614
>>>
>>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis 
>>> wrote:
>>>
>>> > Per my email to the list in Sept, when I reviewed GC logs then, I
>>> didn't
>>> > see anything out of the ordinary. (
>>> >
>>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
>>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
>>> 40mail.gmail.com%3E
>>> > )
>>> >
>>> > Reviewing the GC logs from this morning around the timestamp in my
>>> previous
>>> > email, I see GC running roughly every 10-12 seconds, with total
>>> > times similar to the following:
>>> >
>>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
>>> application
>>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
>>> 0.0002171
>>> > > seconds
>>> > >
>>> > >
>>> > Here is a link to a GCEasy report:
>>> >
>>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
>>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
>>> >
>>> >
>>> > Currently using G1 gc with the following settings:
>>> >
>>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
>>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps
>>> -XX:+PrintGCDetails
>>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
>>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>> > -XX:+UseCompressedOops -XX:+AlwaysPreTouch -XX:+UseG1GC
>>> > -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError
>>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
>>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log
>>> > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>> > -XX:+PrintGCTimeStamps
>>> >
>>> >
>>> >
>>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
>>> > dave.tauz...@surescripts.com
>>> > > wrote:
>>> >
>>> > > Can you collect garbage collection stats and verify there isn't
>>> > > a
>>> long GC
>>> > > happening at the same time?
>>> > >
>>> > > -Dave
>>> > >
>>> > > -Original Message-
>>> > > From: Stephen Powis [mailto:spo...@salesforce.com]
>>> > > Sent: Thursday, January 12, 2017 8:34 AM
>>> > > To: users@kafka.apache.org
>>> > > Subject: Re: Taking a long time to roll a new log segment (~1
>>> > > min)
>>> > >
>>> > > So per the kafka docs I up'd our FD limit to 100k, and we are no
>>> longer
>>> > > seeing the process die, which is good.
>>> > >
>>> > > Unfortunately we're still seeing very high log segment roll
>>> > > times,
>>> and
>>> > I'm
>>> > > unsure if this is considered 'normal', as it tends to block
>>> > > producers during this period.
>>> > >
>>> > > We are running kafka 0.10.0.1, but I patched in some
>>> > > additionally
>>> timing
>>> > > statements into the kafka.log.log roll() method 

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
I've further narrowed it down to this particular line:
https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/OffsetIndex.scala#L294

But I'm still at a loss to why this would be slow sometimes and not others.

On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis 
wrote:

> Using the little bash script in that JIRA ticket to go thru the GC log and
> sum up the total pause times I come up with the following.  I don't see
> anything that would indicate a ~28 second pause.
>
> 2017-01-12T07:00 = 72.2961
>> 2017-01-12T07:01 = 59.8112
>> 2017-01-12T07:02 = 69.6915
>> 2017-01-12T07:03 = 74.4093
>> 2017-01-12T07:04 = 73.507
>> 2017-01-12T07:05 = 73.7028
>> 2017-01-12T07:06 = 72.3392
>> 2017-01-12T07:07 = 70.502
>> 2017-01-12T07:08 = 71.7471
>> 2017-01-12T07:09 = 76.8371
>> 2017-01-12T07:10 = 72.3836
>> 2017-01-12T07:11 = 74.7462
>> 2017-01-12T07:12 = 57.1328
>> 2017-01-12T07:13 = 80.005
>> 2017-01-12T07:14 = 76.0012
>> 2017-01-12T07:15 = 79.2221
>> 2017-01-12T07:16 = 57.5385
>> 2017-01-12T07:17 = 58.1185
>> 2017-01-12T07:18 = 72.0524
>> 2017-01-12T07:19 = 110.6736
>> 2017-01-12T07:20 = 64.9808
>> 2017-01-12T07:21 = 66.8767
>> 2017-01-12T07:22 = 32.409
>> 2017-01-12T07:23 = 85.3671
>> 2017-01-12T07:24 = 123.9212
>> 2017-01-12T07:25 = 70.3219
>>
>
>
>
>
>
> On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis 
> wrote:
>
>> Just realized that GCEasy doesn't keep reports around for very long
>> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
>>
>> The longest reported GC pause was 30ms, tho they happen somewhat
>> frequently at an average of once per 12 seconds.  KAFKA-4616 certainly
>> sounds just like my scenario, but I'm having trouble finding the long GC
>> pauses described in that ticket in my own logs.
>>
>> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao  wrote:
>>
>>> You may run into this bug https://issues.apache.org/jira
>>> /browse/KAFKA-4614
>>>
>>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis 
>>> wrote:
>>>
>>> > Per my email to the list in Sept, when I reviewed GC logs then, I
>>> didn't
>>> > see anything out of the ordinary. (
>>> >
>>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
>>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
>>> 40mail.gmail.com%3E
>>> > )
>>> >
>>> > Reviewing the GC logs from this morning around the timestamp in my
>>> previous
>>> > email, I see GC running roughly every 10-12 seconds, with total times
>>> > similar to the following:
>>> >
>>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
>>> application
>>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
>>> 0.0002171
>>> > > seconds
>>> > >
>>> > >
>>> > Here is a link to a GCEasy report:
>>> >
>>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
>>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
>>> >
>>> >
>>> > Currently using G1 gc with the following settings:
>>> >
>>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
>>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps
>>> -XX:+PrintGCDetails
>>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
>>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
>>> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
>>> > -XX:+HeapDumpOnOutOfMemoryError
>>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
>>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
>>> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>>> >
>>> >
>>> >
>>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
>>> > dave.tauz...@surescripts.com
>>> > > wrote:
>>> >
>>> > > Can you collect garbage collection stats and verify there isn't a
>>> long GC
>>> > > happening at the same time?
>>> > >
>>> > > -Dave
>>> > >
>>> > > -Original Message-
>>> > > From: Stephen Powis [mailto:spo...@salesforce.com]
>>> > > Sent: Thursday, January 12, 2017 8:34 AM
>>> > > To: users@kafka.apache.org
>>> > > Subject: Re: Taking a long time to roll a new log segment (~1 min)
>>> > >
>>> > > So per the kafka docs I up'd our FD limit to 100k, and we are no
>>> longer
>>> > > seeing the process die, which is good.
>>> > >
>>> > > Unfortunately we're still seeing very high log segment roll times,
>>> and
>>> > I'm
>>> > > unsure if this is considered 'normal', as it tends to block producers
>>> > > during this period.
>>> > >
>>> > > We are running kafka 0.10.0.1, but I patched in some additionally
>>> timing
>>> > > statements into the kafka.log.log roll() method to narrow down
>>> exactly
>>> > > which part of that method is taking so long.
>>> > >
>>> > > Again, typically the process to roll a new log file takes only 1-2ms
>>> > tops,
>>> > > but several times a day it takes 30-60+ seconds, across all of our
>>> > > brokers.  I've narrowed it down to 

RE: can i delete topic like this ?

2017-01-12 Thread Tauzell, Dave
You can set the retention for the topic to a small time and then wait for Kafka 
to delete the messages before setting it back:

bin/kafka-topics.sh --zookeeper zk.prod.yoursite.com --alter --topic TOPIC_NAME 
--config retention.ms=1000

-Original Message-
From: Laxmi Narayan NIT DGP [mailto:nit.dgp...@gmail.com]
Sent: Thursday, January 12, 2017 12:52 PM
To: users@kafka.apache.org
Subject: Re: can i delete topic like this ?

Hi ,
If my topic is not enabled for deletion any other way to purge message from 
topic ?



*Regards,*
*Laxmi Narayan Patel*
*MCA NIT Durgapur (2011-2014)*
*Mob:-9741292048,8345847473*

On Fri, Jan 13, 2017 at 12:16 AM, Kaufman Ng  wrote:

> Your zookeeper url doesn't look right.  Port 9092 is kafka broker's
> default listening port.  Zookeeper's default is 2181.
>
>
> On Thu, Jan 12, 2017 at 1:33 PM, Laxmi Narayan NIT DGP <
> nit.dgp...@gmail.com
> > wrote:
>
> > /bin/kafka-topics.sh --zookeeper localhost:9092 --delete --topic
> topicName
> >
> >
> > I am getting exception saying :
> >
> > [2017-01-13 00:01:45,101] WARN Client session timed out, have not
> > heard from server in 15016ms for sessionid 0x0 (org.apache.zookeeper.
> ClientCnxn)
> > [2017-01-13 00:02:00,902] WARN Client session timed out, have not
> > heard from server in 15001ms for sessionid 0x0 (org.apache.zookeeper.
> ClientCnxn)
> > Exception in thread "main"
> > org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect
> > to zookeeper server within timeout: 3
> > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232)
> > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)
> > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)
> > at kafka.utils.ZkUtils$.createZkClientAndConnection(
> ZkUtils.scala:76)
> > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)
> > at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
> > at kafka.admin.TopicCommand.main(TopicCommand.scala)
> >
> >
> >
> >
> >
> > **
> >
> > *Regards,*
> > *Laxmi Narayan Patel*
> > *MCA NIT Durgapur (2011-2014)*
> > *Mob:-9741292048,8345847473*
> >
>
>
>
> --
> Kaufman Ng
> +1 646 961 8063
> Solutions Architect | Confluent | www.confluent.io
>
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123
users@kafka.apache.org;users-unsubscr...@kafka.apache.org;
users_unsubscr...@kafka.apache.org;
d...@kafka.apache.org; dev-unsubscr...@kafka.apache.org;
dev_unsubscr...@kafka.apache.org
-Original Message-
From: Raj Tanneru [mailto:rtann...@fanatics.com] 
Sent: Saturday, May 7, 2016 6:46 PM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: Re: KAFKA-3112

Thanks Ismael and Tao. Appreciate it.

Sent from my iPhone

> On May 7, 2016, at 1:14 AM, Ismael Juma  wrote:
>
> Hi Raj and Tao,
>
> I just merged the KAFKA-3112 PR, so this issue will be fixed in 0.10.0.0.
>
> Thanks,
> Ismael
>
>> On Fri, May 6, 2016 at 7:47 PM, tao xiao  wrote:
>>
>> KAFKA-2657 is unresolved so you can safely assume it hasn't been fixed
yet.
>>
>>> On Fri, 6 May 2016 at 07:38 Raj Tanneru  wrote:
>>>
>>> Yeah it is a duplicate of KAFKA-2657. The question is how to check / 
>>> know if it is merged to 0.9.0.1 release. What are the options that I 
>>> have if I need this fix. How can I get patch for this on 0.8.2.1?
>>>
>>> Sent from my iPhone
>>>
 On May 6, 2016, at 12:06 AM, tao xiao  wrote:

 It said this is a duplication. This is the
 https://issues.apache.org/jira/browse/KAFKA-2657 that KAKFA-3112
>>> duplicates
 to.

> On Thu, 5 May 2016 at 22:13 Raj Tanneru 
>> wrote:
>
>
> Hi All,
> Does anyone know if KAFKA-3112 is merged to 0.9.0.1? Is there a 
> place
>> to
> check which version has this fix? Jira doesn't show fix versions.
>
> https://issues.apache.org/jira/browse/KAFKA-3112
>
>
> Thanks,
> Raj Tanneru
> Information contained in this e-mail message is confidential. This
>>> e-mail
> message is intended only for the personal use of the recipient(s)
>> named
> above. If you are not an intended recipient, do not read, 
> distribute
>> or
> reproduce this transmission (including any attachments). If you 
> have received this email in error, please immediately notify the 
> sender by
>>> email
> reply and delete the original message.
>>> Information contained in this e-mail message is confidential. This 
>>> e-mail message is intended only for the personal use of the 
>>> recipient(s) named above. If you are not an intended recipient, do 
>>> not read, distribute or reproduce this transmission (including any 
>>> attachments). If you have received this email in error, please 
>>> immediately notify the sender by
>> email
>>> reply and delete the original message.
>>
Information contained in this e-mail message is confidential. This e-mail
message is intended only for the personal use of the recipient(s) named
above. If you are not an intended recipient, do not read, distribute or
reproduce this transmission (including any attachments). If you have
received this email in error, please immediately notify the sender by email
reply and delete the original message.



Re: can i delete topic like this ?

2017-01-12 Thread Laxmi Narayan NIT DGP
Hi ,
If my topic is not enabled for deletion any other way to purge message from
topic ?



*Regards,*
*Laxmi Narayan Patel*
*MCA NIT Durgapur (2011-2014)*
*Mob:-9741292048,8345847473*

On Fri, Jan 13, 2017 at 12:16 AM, Kaufman Ng  wrote:

> Your zookeeper url doesn't look right.  Port 9092 is kafka broker's default
> listening port.  Zookeeper's default is 2181.
>
>
> On Thu, Jan 12, 2017 at 1:33 PM, Laxmi Narayan NIT DGP <
> nit.dgp...@gmail.com
> > wrote:
>
> > /bin/kafka-topics.sh --zookeeper localhost:9092 --delete --topic
> topicName
> >
> >
> > I am getting exception saying :
> >
> > [2017-01-13 00:01:45,101] WARN Client session timed out, have not heard
> > from server in 15016ms for sessionid 0x0 (org.apache.zookeeper.
> ClientCnxn)
> > [2017-01-13 00:02:00,902] WARN Client session timed out, have not heard
> > from server in 15001ms for sessionid 0x0 (org.apache.zookeeper.
> ClientCnxn)
> > Exception in thread "main"
> > org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
> > zookeeper server within timeout: 3
> > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232)
> > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)
> > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)
> > at kafka.utils.ZkUtils$.createZkClientAndConnection(
> ZkUtils.scala:76)
> > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)
> > at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
> > at kafka.admin.TopicCommand.main(TopicCommand.scala)
> >
> >
> >
> >
> >
> > **
> >
> > *Regards,*
> > *Laxmi Narayan Patel*
> > *MCA NIT Durgapur (2011-2014)*
> > *Mob:-9741292048,8345847473*
> >
>
>
>
> --
> Kaufman Ng
> +1 646 961 8063
> Solutions Architect | Confluent | www.confluent.io
>


Re: can i delete topic like this ?

2017-01-12 Thread Kaufman Ng
Your zookeeper url doesn't look right.  Port 9092 is kafka broker's default
listening port.  Zookeeper's default is 2181.


On Thu, Jan 12, 2017 at 1:33 PM, Laxmi Narayan NIT DGP  wrote:

> /bin/kafka-topics.sh --zookeeper localhost:9092 --delete --topic topicName
>
>
> I am getting exception saying :
>
> [2017-01-13 00:01:45,101] WARN Client session timed out, have not heard
> from server in 15016ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
> [2017-01-13 00:02:00,902] WARN Client session timed out, have not heard
> from server in 15001ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
> Exception in thread "main"
> org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
> zookeeper server within timeout: 3
> at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)
> at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)
> at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:76)
> at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)
> at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
> at kafka.admin.TopicCommand.main(TopicCommand.scala)
>
>
>
>
>
> **
>
> *Regards,*
> *Laxmi Narayan Patel*
> *MCA NIT Durgapur (2011-2014)*
> *Mob:-9741292048,8345847473*
>



-- 
Kaufman Ng
+1 646 961 8063
Solutions Architect | Confluent | www.confluent.io


can i delete topic like this ?

2017-01-12 Thread Laxmi Narayan NIT DGP
/bin/kafka-topics.sh --zookeeper localhost:9092 --delete --topic topicName


I am getting exception saying :

[2017-01-13 00:01:45,101] WARN Client session timed out, have not heard
from server in 15016ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
[2017-01-13 00:02:00,902] WARN Client session timed out, have not heard
from server in 15001ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
Exception in thread "main"
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to
zookeeper server within timeout: 3
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1232)
at org.I0Itec.zkclient.ZkClient.(ZkClient.java:156)
at org.I0Itec.zkclient.ZkClient.(ZkClient.java:130)
at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:76)
at kafka.utils.ZkUtils$.apply(ZkUtils.scala:58)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:53)
at kafka.admin.TopicCommand.main(TopicCommand.scala)





**

*Regards,*
*Laxmi Narayan Patel*
*MCA NIT Durgapur (2011-2014)*
*Mob:-9741292048,8345847473*


Re: [ANNOUNCE] New committer: Grant Henke

2017-01-12 Thread Joel Koshy
Hey Grant - congrats!

On Thu, Jan 12, 2017 at 10:00 AM, Neha Narkhede  wrote:

> Congratulations, Grant. Well deserved!
>
> On Thu, Jan 12, 2017 at 7:51 AM Grant Henke  wrote:
>
> > Thanks everyone!
> >
> > On Thu, Jan 12, 2017 at 2:58 AM, Damian Guy 
> wrote:
> >
> > > Congratulations!
> > >
> > > On Thu, 12 Jan 2017 at 03:35 Jun Rao  wrote:
> > >
> > > > Grant,
> > > >
> > > > Thanks for all your contribution! Congratulations!
> > > >
> > > > Jun
> > > >
> > > > On Wed, Jan 11, 2017 at 2:51 PM, Gwen Shapira 
> > wrote:
> > > >
> > > > > The PMC for Apache Kafka has invited Grant Henke to join as a
> > > > > committer and we are pleased to announce that he has accepted!
> > > > >
> > > > > Grant contributed 88 patches, 90 code reviews, countless great
> > > > > comments on discussions, a much-needed cleanup to our protocol and
> > the
> > > > > on-going and critical work on the Admin protocol. Throughout this,
> he
> > > > > displayed great technical judgment, high-quality work and
> willingness
> > > > > to contribute where needed to make Apache Kafka awesome.
> > > > >
> > > > > Thank you for your contributions, Grant :)
> > > > >
> > > > > --
> > > > > Gwen Shapira
> > > > > Product Manager | Confluent
> > > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760> | @gwenshap
> > > > > Follow us: Twitter | blog
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >
> --
> Thanks,
> Neha
>


UNSUBSCRIBE PLEASE

2017-01-12 Thread williamtellme123

-Original Message-
From: Kuldeep Kamboj [mailto:kuldeep.kam...@osscube.com] 
Sent: Monday, May 2, 2016 11:29 PM
To: users@kafka.apache.org
Subject: Getting Timed out reading socket error for kafka cluster setup

Hi,

I want to setup a kafka cluster type setup for three similar application having 
same queues like AppA -> {TopicX, TopicY, TopicZ},  AppB -> {TopicX, TopicZ}, 
AppC -> {TopicX, TopicY}. Producer and Consumer will be App specific.
I setup kafka cluster with three brokers having partition 1,2,3 in three 
different config files with different ports. Then start kafka server ( cluster )

I am using kafka php wrapper by http://github.com/nmred/kafka-php

So I used Producer code for App A like

   $producer->setRequireAck(-1);
   $producer->setMessages("TopicX", 0, array(json_encode($this->data)));
   $producer->send();

AND used Producer code for App B like

   $producer->setRequireAck(-1);
   $producer->setMessages("TopicX", 1, array(json_encode($this->data)));
   $producer->send();

And So On.

Then I made my Consumer scripts for three apps like

$queues = array("TopicX", "TopicY", "TopicZ");
while(true) {
foreach($queues as $queue) {
$consumer =
\Kafka\Consumer::getInstance('localhost:2181');
$consumer->setGroup('testgroup');
$consumer->setPartition($queue, 0);
$result = $consumer->fetch();
   }
}

But when I try to execute consumer script for any App I get error like

**"Timed out reading socket while reading 750437 bytes with 750323 bytes to
go"**

I just don't know How I can fix this issue I tried to modify some kafka config 
parameters like

 zookeeper.connection.timeout.ms=24000 # Initially 6000
 replica.socket.timeout.ms=15000  # Not exists in
default file

but that not worked.


--
Kuldeep Kamboj



Re: [ANNOUNCE] New committer: Grant Henke

2017-01-12 Thread Neha Narkhede
Congratulations, Grant. Well deserved!

On Thu, Jan 12, 2017 at 7:51 AM Grant Henke  wrote:

> Thanks everyone!
>
> On Thu, Jan 12, 2017 at 2:58 AM, Damian Guy  wrote:
>
> > Congratulations!
> >
> > On Thu, 12 Jan 2017 at 03:35 Jun Rao  wrote:
> >
> > > Grant,
> > >
> > > Thanks for all your contribution! Congratulations!
> > >
> > > Jun
> > >
> > > On Wed, Jan 11, 2017 at 2:51 PM, Gwen Shapira 
> wrote:
> > >
> > > > The PMC for Apache Kafka has invited Grant Henke to join as a
> > > > committer and we are pleased to announce that he has accepted!
> > > >
> > > > Grant contributed 88 patches, 90 code reviews, countless great
> > > > comments on discussions, a much-needed cleanup to our protocol and
> the
> > > > on-going and critical work on the Admin protocol. Throughout this, he
> > > > displayed great technical judgment, high-quality work and willingness
> > > > to contribute where needed to make Apache Kafka awesome.
> > > >
> > > > Thank you for your contributions, Grant :)
> > > >
> > > > --
> > > > Gwen Shapira
> > > > Product Manager | Confluent
> > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760> | @gwenshap
> > > > Follow us: Twitter | blog
> > > >
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>
-- 
Thanks,
Neha


Re: Kafka Streams: consume 6 months old data VS windows maintain durations

2017-01-12 Thread Eno Thereska
Hi Nicolas,

I've seen your previous message thread too. I think your best bet for now is to 
increase the window duration time, to 6 months.

If you change your application logic, e.g., by changing the duration time, the 
semantics of the change wouldn't immediate be clear and it's worth clarifying 
those. For example, would the intention be to reprocess all the data from the 
beginning? Or start where you left off (in which case the fact that the 
original processing went over data that is 6 month old would not be relevant, 
since you'd start from where you left off the second time)? Right now we 
support a limited way to reprocess the data by effectively resetting a streams 
application 
(https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/
 
).
 I wouldn't recommend using that if you want to keep the results of the 
previous run though. 

Eno

> On 12 Jan 2017, at 09:15, Nicolas Fouché  wrote:
> 
> Hi.
> 
> 
> I'd like to re-consume 6 months old data with Kafka Streams.
> 
> My current topology can't because it defines aggregations with windows 
> maintain durations of 3 days.
> TimeWindows.of(ONE_HOUR_MILLIS).until(THREE_DAYS_MILLIS)
> 
> 
> 
> As discovered (and shared [1]) a few months ago, consuming a record older 
> than 3 days will mess up my aggregates. How do you deal with this ? Do you 
> temporarily raise the windows maintain durations until all records are 
> consumed ? Do you always run your topologies with long durations, like a year 
> ? I have no idea what would be the impact on the RAM and disk, but I guess 
> RocksDB would cry a little.
> 
> 
> Final question: il I raise the duration to 6 months, consume my records, and 
> then set the duration back to 3 days, would the old aggregates automatically 
> destroyed ?
> 
> 
> [1] 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/%3ccabqkjkj42n7z4bxjdkrdyz_kmpunh738uxvm7gy24dnkx+r...@mail.gmail.com%3e
> 
> 
> Thanks
> Nicolas
> 
> 



Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Using the little bash script in that JIRA ticket to go thru the GC log and
sum up the total pause times I come up with the following.  I don't see
anything that would indicate a ~28 second pause.

2017-01-12T07:00 = 72.2961
> 2017-01-12T07:01 = 59.8112
> 2017-01-12T07:02 = 69.6915
> 2017-01-12T07:03 = 74.4093
> 2017-01-12T07:04 = 73.507
> 2017-01-12T07:05 = 73.7028
> 2017-01-12T07:06 = 72.3392
> 2017-01-12T07:07 = 70.502
> 2017-01-12T07:08 = 71.7471
> 2017-01-12T07:09 = 76.8371
> 2017-01-12T07:10 = 72.3836
> 2017-01-12T07:11 = 74.7462
> 2017-01-12T07:12 = 57.1328
> 2017-01-12T07:13 = 80.005
> 2017-01-12T07:14 = 76.0012
> 2017-01-12T07:15 = 79.2221
> 2017-01-12T07:16 = 57.5385
> 2017-01-12T07:17 = 58.1185
> 2017-01-12T07:18 = 72.0524
> 2017-01-12T07:19 = 110.6736
> 2017-01-12T07:20 = 64.9808
> 2017-01-12T07:21 = 66.8767
> 2017-01-12T07:22 = 32.409
> 2017-01-12T07:23 = 85.3671
> 2017-01-12T07:24 = 123.9212
> 2017-01-12T07:25 = 70.3219
>





On Thu, Jan 12, 2017 at 10:49 AM, Stephen Powis 
wrote:

> Just realized that GCEasy doesn't keep reports around for very long
> anymore, here is a screencap of the report:  http://imgur.com/a/MEubD
>
> The longest reported GC pause was 30ms, tho they happen somewhat
> frequently at an average of once per 12 seconds.  KAFKA-4616 certainly
> sounds just like my scenario, but I'm having trouble finding the long GC
> pauses described in that ticket in my own logs.
>
> On Thu, Jan 12, 2017 at 10:42 AM, tao xiao  wrote:
>
>> You may run into this bug https://issues.apache.org/jira
>> /browse/KAFKA-4614
>>
>> On Thu, 12 Jan 2017 at 23:38 Stephen Powis  wrote:
>>
>> > Per my email to the list in Sept, when I reviewed GC logs then, I didn't
>> > see anything out of the ordinary. (
>> >
>> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.
>> mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%
>> 40mail.gmail.com%3E
>> > )
>> >
>> > Reviewing the GC logs from this morning around the timestamp in my
>> previous
>> > email, I see GC running roughly every 10-12 seconds, with total times
>> > similar to the following:
>> >
>> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which
>> application
>> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
>> 0.0002171
>> > > seconds
>> > >
>> > >
>> > Here is a link to a GCEasy report:
>> >
>> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIv
>> LS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
>> >
>> >
>> > Currently using G1 gc with the following settings:
>> >
>> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
>> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
>> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
>> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
>> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
>> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
>> > -XX:+HeapDumpOnOutOfMemoryError
>> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
>> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
>> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>> >
>> >
>> >
>> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
>> > dave.tauz...@surescripts.com
>> > > wrote:
>> >
>> > > Can you collect garbage collection stats and verify there isn't a
>> long GC
>> > > happening at the same time?
>> > >
>> > > -Dave
>> > >
>> > > -Original Message-
>> > > From: Stephen Powis [mailto:spo...@salesforce.com]
>> > > Sent: Thursday, January 12, 2017 8:34 AM
>> > > To: users@kafka.apache.org
>> > > Subject: Re: Taking a long time to roll a new log segment (~1 min)
>> > >
>> > > So per the kafka docs I up'd our FD limit to 100k, and we are no
>> longer
>> > > seeing the process die, which is good.
>> > >
>> > > Unfortunately we're still seeing very high log segment roll times, and
>> > I'm
>> > > unsure if this is considered 'normal', as it tends to block producers
>> > > during this period.
>> > >
>> > > We are running kafka 0.10.0.1, but I patched in some additionally
>> timing
>> > > statements into the kafka.log.log roll() method to narrow down exactly
>> > > which part of that method is taking so long.
>> > >
>> > > Again, typically the process to roll a new log file takes only 1-2ms
>> > tops,
>> > > but several times a day it takes 30-60+ seconds, across all of our
>> > > brokers.  I've narrowed it down to this bit of code causing the issue:
>> > > https://github.com/apache/kafka/blob/0.10.0/core/src/
>> > > main/scala/kafka/log/Log.scala#L652-L658
>> > >
>> > > Here's an example of output w/ my additional timing log statements:
>> > >
>> > > [2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4'
>> in
>> > > > 28028 ms. (kafka.log.Log)
>> > >
>> > > [2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
>> > > > 28027 ms. 

Re: [ANNOUNCE] New committer: Grant Henke

2017-01-12 Thread Grant Henke
Thanks everyone!

On Thu, Jan 12, 2017 at 2:58 AM, Damian Guy  wrote:

> Congratulations!
>
> On Thu, 12 Jan 2017 at 03:35 Jun Rao  wrote:
>
> > Grant,
> >
> > Thanks for all your contribution! Congratulations!
> >
> > Jun
> >
> > On Wed, Jan 11, 2017 at 2:51 PM, Gwen Shapira  wrote:
> >
> > > The PMC for Apache Kafka has invited Grant Henke to join as a
> > > committer and we are pleased to announce that he has accepted!
> > >
> > > Grant contributed 88 patches, 90 code reviews, countless great
> > > comments on discussions, a much-needed cleanup to our protocol and the
> > > on-going and critical work on the Admin protocol. Throughout this, he
> > > displayed great technical judgment, high-quality work and willingness
> > > to contribute where needed to make Apache Kafka awesome.
> > >
> > > Thank you for your contributions, Grant :)
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Just realized that GCEasy doesn't keep reports around for very long
anymore, here is a screencap of the report:  http://imgur.com/a/MEubD

The longest reported GC pause was 30ms, tho they happen somewhat frequently
at an average of once per 12 seconds.  KAFKA-4616 certainly sounds just
like my scenario, but I'm having trouble finding the long GC pauses
described in that ticket in my own logs.

On Thu, Jan 12, 2017 at 10:42 AM, tao xiao  wrote:

> You may run into this bug https://issues.apache.org/jira/browse/KAFKA-4614
>
> On Thu, 12 Jan 2017 at 23:38 Stephen Powis  wrote:
>
> > Per my email to the list in Sept, when I reviewed GC logs then, I didn't
> > see anything out of the ordinary. (
> >
> > http://mail-archives.apache.org/mod_mbox/kafka-users/
> 201609.mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz
> 0kCjmdB9c0SDcLQ%40mail.gmail.com%3E
> > )
> >
> > Reviewing the GC logs from this morning around the timestamp in my
> previous
> > email, I see GC running roughly every 10-12 seconds, with total times
> > similar to the following:
> >
> > 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which application
> > > threads were stopped: 0.0141281 seconds, Stopping threads took:
> 0.0002171
> > > seconds
> > >
> > >
> > Here is a link to a GCEasy report:
> >
> > http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIvLS10b3
> RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
> >
> >
> > Currently using G1 gc with the following settings:
> >
> > -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
> > -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> > -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
> > -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
> > -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> > -XX:+HeapDumpOnOutOfMemoryError
> > -XX:HeapDumpPath=/var/log/kafka/heapDump.log
> > -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
> > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> >
> >
> >
> > On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
> > dave.tauz...@surescripts.com
> > > wrote:
> >
> > > Can you collect garbage collection stats and verify there isn't a long
> GC
> > > happening at the same time?
> > >
> > > -Dave
> > >
> > > -Original Message-
> > > From: Stephen Powis [mailto:spo...@salesforce.com]
> > > Sent: Thursday, January 12, 2017 8:34 AM
> > > To: users@kafka.apache.org
> > > Subject: Re: Taking a long time to roll a new log segment (~1 min)
> > >
> > > So per the kafka docs I up'd our FD limit to 100k, and we are no longer
> > > seeing the process die, which is good.
> > >
> > > Unfortunately we're still seeing very high log segment roll times, and
> > I'm
> > > unsure if this is considered 'normal', as it tends to block producers
> > > during this period.
> > >
> > > We are running kafka 0.10.0.1, but I patched in some additionally
> timing
> > > statements into the kafka.log.log roll() method to narrow down exactly
> > > which part of that method is taking so long.
> > >
> > > Again, typically the process to roll a new log file takes only 1-2ms
> > tops,
> > > but several times a day it takes 30-60+ seconds, across all of our
> > > brokers.  I've narrowed it down to this bit of code causing the issue:
> > > https://github.com/apache/kafka/blob/0.10.0/core/src/
> > > main/scala/kafka/log/Log.scala#L652-L658
> > >
> > > Here's an example of output w/ my additional timing log statements:
> > >
> > > [2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4'
> in
> > > > 28028 ms. (kafka.log.Log)
> > >
> > > [2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> > > > 28027 ms. (kafka.log.Log)
> > > >
> > >
> > > Any thoughts?  Help?  This considered normal?
> > >
> > > Thanks
> > > Stephen
> > >
> > >
> > > On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > I can't speak to the exact details of why fds would be kept open
> > > > longer in that specific case, but are you aware that the
> > > > recommendation for production clusters for open fd limits is much
> > > > higher? It's been suggested to be 100,000 as a starting point for
> quite
> > > awhile:
> > > > http://kafka.apache.org/documentation.html#os
> > > >
> > > > -Ewen
> > > >
> > > > On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis <
> spo...@salesforce.com>
> > > > wrote:
> > > >
> > > > > Hey!
> > > > >
> > > > > I've run into something concerning in our production clusterI
> > > > > believe I've posted this question to the mailing list previously (
> > > > > http://mail-archives.apache.org/mod_mbox/kafka-users/
> 201609.mbox/bro
> > > > > wser
> > > > )
> > > > > but the problem has become considerably more serious.
> > > > >
> > > > > We've been fighting issues where Kafka 0.10.0.1 hits its max 

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread tao xiao
You may run into this bug https://issues.apache.org/jira/browse/KAFKA-4614

On Thu, 12 Jan 2017 at 23:38 Stephen Powis  wrote:

> Per my email to the list in Sept, when I reviewed GC logs then, I didn't
> see anything out of the ordinary. (
>
> http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%40mail.gmail.com%3E
> )
>
> Reviewing the GC logs from this morning around the timestamp in my previous
> email, I see GC running roughly every 10-12 seconds, with total times
> similar to the following:
>
> 2017-01-12T07:16:46.867-0500: 46891.844: Total time for which application
> > threads were stopped: 0.0141281 seconds, Stopping threads took: 0.0002171
> > seconds
> >
> >
> Here is a link to a GCEasy report:
>
> http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIvLS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=
>
>
> Currently using G1 gc with the following settings:
>
> -Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
> -Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
> -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/var/log/kafka/heapDump.log
> -Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
>
>
>
> On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave <
> dave.tauz...@surescripts.com
> > wrote:
>
> > Can you collect garbage collection stats and verify there isn't a long GC
> > happening at the same time?
> >
> > -Dave
> >
> > -Original Message-
> > From: Stephen Powis [mailto:spo...@salesforce.com]
> > Sent: Thursday, January 12, 2017 8:34 AM
> > To: users@kafka.apache.org
> > Subject: Re: Taking a long time to roll a new log segment (~1 min)
> >
> > So per the kafka docs I up'd our FD limit to 100k, and we are no longer
> > seeing the process die, which is good.
> >
> > Unfortunately we're still seeing very high log segment roll times, and
> I'm
> > unsure if this is considered 'normal', as it tends to block producers
> > during this period.
> >
> > We are running kafka 0.10.0.1, but I patched in some additionally timing
> > statements into the kafka.log.log roll() method to narrow down exactly
> > which part of that method is taking so long.
> >
> > Again, typically the process to roll a new log file takes only 1-2ms
> tops,
> > but several times a day it takes 30-60+ seconds, across all of our
> > brokers.  I've narrowed it down to this bit of code causing the issue:
> > https://github.com/apache/kafka/blob/0.10.0/core/src/
> > main/scala/kafka/log/Log.scala#L652-L658
> >
> > Here's an example of output w/ my additional timing log statements:
> >
> > [2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4' in
> > > 28028 ms. (kafka.log.Log)
> >
> > [2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> > > 28027 ms. (kafka.log.Log)
> > >
> >
> > Any thoughts?  Help?  This considered normal?
> >
> > Thanks
> > Stephen
> >
> >
> > On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > I can't speak to the exact details of why fds would be kept open
> > > longer in that specific case, but are you aware that the
> > > recommendation for production clusters for open fd limits is much
> > > higher? It's been suggested to be 100,000 as a starting point for quite
> > awhile:
> > > http://kafka.apache.org/documentation.html#os
> > >
> > > -Ewen
> > >
> > > On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis 
> > > wrote:
> > >
> > > > Hey!
> > > >
> > > > I've run into something concerning in our production clusterI
> > > > believe I've posted this question to the mailing list previously (
> > > > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/bro
> > > > wser
> > > )
> > > > but the problem has become considerably more serious.
> > > >
> > > > We've been fighting issues where Kafka 0.10.0.1 hits its max file
> > > > descriptor limit.  Our limit is set to ~16k, and under normal
> > > > operation
> > > it
> > > > holds steady around 4k open files.
> > > >
> > > > But occasionally Kafka will roll a new log segment, which typically
> > > > takes on the order of magnitude of a few milliseconds.
> > > > However...sometimes it will take a considerable amount of time, any
> > > > where from 40 seconds up to over a minute.  When this happens, it
> > > > seems like connections are not released by kafka, and we end up with
> > > > thousands of client connections
> > > stuck
> > > > in CLOSE_WAIT, which pile up and exceed our max file descriptor
> limit.
> > > > This happens all in the span of about a minute.
> > > >
> > > > 

Re: Consumer Balancing multi topic, single partition

2017-01-12 Thread Tobias Adamson
Thanks, works well
For someone searching for this..
Example:

props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, 
"org.apache.kafka.clients.consumer.RoundRobinAssignor");


> On 12 Jan 2017, at 11:37 PM, tao xiao  wrote:
> 
> The default partition assignor is range assignor which assigns works on a
> per-topic basis. If you topics with one partition only they will be
> assigned to the same consumer. You can change the assignor to
> org.apache.kafka.clients.consumer.RoundRobinAssignor
> 
> On Thu, 12 Jan 2017 at 22:33 Tobias Adamson  wrote:
> 
> Hi
> We have a scenario where we have a lot of single partition topics for
> ordering purposes.
> We then want to use multiple consumer processes listening to many topics.
> During testing it seems like one consumer process will always end up with
> all topics/partitions assigned to it and there is no real distribution
> going on.
> 
> Does the assignment process take into account the total number of
> partitions assigned to each consumer process or does it only
> check assignment per topic?
> 
> Regards
> Toby



Re: Consumer Balancing multi topic, single partition

2017-01-12 Thread tao xiao
The default partition assignor is range assignor which assigns works on a
per-topic basis. If you topics with one partition only they will be
assigned to the same consumer. You can change the assignor to
org.apache.kafka.clients.consumer.RoundRobinAssignor

On Thu, 12 Jan 2017 at 22:33 Tobias Adamson  wrote:

Hi
We have a scenario where we have a lot of single partition topics for
ordering purposes.
We then want to use multiple consumer processes listening to many topics.
During testing it seems like one consumer process will always end up with
all topics/partitions assigned to it and there is no real distribution
going on.

Does the assignment process take into account the total number of
partitions assigned to each consumer process or does it only
check assignment per topic?

Regards
Toby


Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Per my email to the list in Sept, when I reviewed GC logs then, I didn't
see anything out of the ordinary. (
http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%40mail.gmail.com%3E
)

Reviewing the GC logs from this morning around the timestamp in my previous
email, I see GC running roughly every 10-12 seconds, with total times
similar to the following:

2017-01-12T07:16:46.867-0500: 46891.844: Total time for which application
> threads were stopped: 0.0141281 seconds, Stopping threads took: 0.0002171
> seconds
>
>
Here is a link to a GCEasy report:
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMDEvMTIvLS10b3RhbEdDLWthZmthMS00LmxvZy5nei0tMTUtMzQtNTk=


Currently using G1 gc with the following settings:

-Xmx12G -Xms12G -server -XX:MaxPermSize=48M -verbose:gc
-Xloggc:/var/log/kafka/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
-XX:+PrintTLAB -XX:+DisableExplicitGC -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M -XX:+UseCompressedOops
-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/kafka/heapDump.log
-Xloggc:/opt/kafka/current/bin/../logs/kafkaServer-gc.log -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps



On Thu, Jan 12, 2017 at 9:55 AM, Tauzell, Dave  wrote:

> Can you collect garbage collection stats and verify there isn't a long GC
> happening at the same time?
>
> -Dave
>
> -Original Message-
> From: Stephen Powis [mailto:spo...@salesforce.com]
> Sent: Thursday, January 12, 2017 8:34 AM
> To: users@kafka.apache.org
> Subject: Re: Taking a long time to roll a new log segment (~1 min)
>
> So per the kafka docs I up'd our FD limit to 100k, and we are no longer
> seeing the process die, which is good.
>
> Unfortunately we're still seeing very high log segment roll times, and I'm
> unsure if this is considered 'normal', as it tends to block producers
> during this period.
>
> We are running kafka 0.10.0.1, but I patched in some additionally timing
> statements into the kafka.log.log roll() method to narrow down exactly
> which part of that method is taking so long.
>
> Again, typically the process to roll a new log file takes only 1-2ms tops,
> but several times a day it takes 30-60+ seconds, across all of our
> brokers.  I've narrowed it down to this bit of code causing the issue:
> https://github.com/apache/kafka/blob/0.10.0/core/src/
> main/scala/kafka/log/Log.scala#L652-L658
>
> Here's an example of output w/ my additional timing log statements:
>
> [2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4' in
> > 28028 ms. (kafka.log.Log)
>
> [2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> > 28027 ms. (kafka.log.Log)
> >
>
> Any thoughts?  Help?  This considered normal?
>
> Thanks
> Stephen
>
>
> On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava 
> wrote:
>
> > I can't speak to the exact details of why fds would be kept open
> > longer in that specific case, but are you aware that the
> > recommendation for production clusters for open fd limits is much
> > higher? It's been suggested to be 100,000 as a starting point for quite
> awhile:
> > http://kafka.apache.org/documentation.html#os
> >
> > -Ewen
> >
> > On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis 
> > wrote:
> >
> > > Hey!
> > >
> > > I've run into something concerning in our production clusterI
> > > believe I've posted this question to the mailing list previously (
> > > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/bro
> > > wser
> > )
> > > but the problem has become considerably more serious.
> > >
> > > We've been fighting issues where Kafka 0.10.0.1 hits its max file
> > > descriptor limit.  Our limit is set to ~16k, and under normal
> > > operation
> > it
> > > holds steady around 4k open files.
> > >
> > > But occasionally Kafka will roll a new log segment, which typically
> > > takes on the order of magnitude of a few milliseconds.
> > > However...sometimes it will take a considerable amount of time, any
> > > where from 40 seconds up to over a minute.  When this happens, it
> > > seems like connections are not released by kafka, and we end up with
> > > thousands of client connections
> > stuck
> > > in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
> > > This happens all in the span of about a minute.
> > >
> > > Our logs look like this:
> > >
> > > [2017-01-08 01:10:17,117] INFO Rolled new log segment for
> > > 'MyTopic-8' in
> > > > 41122 ms. (kafka.log.Log)
> > > > [2017-01-08 01:10:32,550] INFO Rolled new log segment for 'MyTopic-4'
> > in
> > > 1
> > > > ms. (kafka.log.Log)
> > > > [2017-01-08 01:11:10,039] INFO [Group Metadata Manager on Broker 4]:
> > > > Removed 0 expired offsets in 0 milliseconds.

RE: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Tauzell, Dave
Can you collect garbage collection stats and verify there isn't a long GC 
happening at the same time?

-Dave

-Original Message-
From: Stephen Powis [mailto:spo...@salesforce.com]
Sent: Thursday, January 12, 2017 8:34 AM
To: users@kafka.apache.org
Subject: Re: Taking a long time to roll a new log segment (~1 min)

So per the kafka docs I up'd our FD limit to 100k, and we are no longer seeing 
the process die, which is good.

Unfortunately we're still seeing very high log segment roll times, and I'm 
unsure if this is considered 'normal', as it tends to block producers during 
this period.

We are running kafka 0.10.0.1, but I patched in some additionally timing 
statements into the kafka.log.log roll() method to narrow down exactly which 
part of that method is taking so long.

Again, typically the process to roll a new log file takes only 1-2ms tops, but 
several times a day it takes 30-60+ seconds, across all of our brokers.  I've 
narrowed it down to this bit of code causing the issue:
https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/Log.scala#L652-L658

Here's an example of output w/ my additional timing log statements:

[2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4' in
> 28028 ms. (kafka.log.Log)

[2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> 28027 ms. (kafka.log.Log)
>

Any thoughts?  Help?  This considered normal?

Thanks
Stephen


On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava 
wrote:

> I can't speak to the exact details of why fds would be kept open
> longer in that specific case, but are you aware that the
> recommendation for production clusters for open fd limits is much
> higher? It's been suggested to be 100,000 as a starting point for quite 
> awhile:
> http://kafka.apache.org/documentation.html#os
>
> -Ewen
>
> On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis 
> wrote:
>
> > Hey!
> >
> > I've run into something concerning in our production clusterI
> > believe I've posted this question to the mailing list previously (
> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/bro
> > wser
> )
> > but the problem has become considerably more serious.
> >
> > We've been fighting issues where Kafka 0.10.0.1 hits its max file
> > descriptor limit.  Our limit is set to ~16k, and under normal
> > operation
> it
> > holds steady around 4k open files.
> >
> > But occasionally Kafka will roll a new log segment, which typically
> > takes on the order of magnitude of a few milliseconds.
> > However...sometimes it will take a considerable amount of time, any
> > where from 40 seconds up to over a minute.  When this happens, it
> > seems like connections are not released by kafka, and we end up with
> > thousands of client connections
> stuck
> > in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
> > This happens all in the span of about a minute.
> >
> > Our logs look like this:
> >
> > [2017-01-08 01:10:17,117] INFO Rolled new log segment for
> > 'MyTopic-8' in
> > > 41122 ms. (kafka.log.Log)
> > > [2017-01-08 01:10:32,550] INFO Rolled new log segment for 'MyTopic-4'
> in
> > 1
> > > ms. (kafka.log.Log)
> > > [2017-01-08 01:11:10,039] INFO [Group Metadata Manager on Broker 4]:
> > > Removed 0 expired offsets in 0 milliseconds.
> > > (kafka.coordinator.GroupMetadataManager)
> > > [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> > > (kafka.network.Acceptor)
> > > java.io.IOException: Too many open files   at
> > > sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > >
> > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 422)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 250)
> > > at kafka.network.Acceptor.accept(SocketServer.scala:323)
> > > at kafka.network.Acceptor.run(SocketServer.scala:268)
> > > at java.lang.Thread.run(Thread.java:745)
> > > [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> > > (kafka.network.Acceptor)
> > > java.io.IOException: Too many open files
> > > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 422)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 250)
> > > at kafka.network.Acceptor.accept(SocketServer.scala:323)
> > > at kafka.network.Acceptor.run(SocketServer.scala:268)
> > > at java.lang.Thread.run(Thread.java:745)
> > > .
> > >
> >
> >
> > And then kafka crashes.
> >
> > Has anyone seen this behavior of slow log segmented being rolled?
> > Any ideas of how to track down what could be causing this?
> >
> > Thanks!
> > Stephen
> >
>
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and 

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
So per the kafka docs I up'd our FD limit to 100k, and we are no longer
seeing the process die, which is good.

Unfortunately we're still seeing very high log segment roll times, and I'm
unsure if this is considered 'normal', as it tends to block producers
during this period.

We are running kafka 0.10.0.1, but I patched in some additionally timing
statements into the kafka.log.log roll() method to narrow down exactly
which part of that method is taking so long.

Again, typically the process to roll a new log file takes only 1-2ms tops,
but several times a day it takes 30-60+ seconds, across all of our
brokers.  I've narrowed it down to this bit of code causing the issue:
https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/Log.scala#L652-L658

Here's an example of output w/ my additional timing log statements:

[2017-01-12 07:17:58,199] INFO Rolled new log segment for 'MyTopic-4' in
> 28028 ms. (kafka.log.Log)

[2017-01-12 07:17:58,200] INFO LastEntry match time 'EventStream-4' in
> 28027 ms. (kafka.log.Log)
>

Any thoughts?  Help?  This considered normal?

Thanks
Stephen


On Tue, Jan 10, 2017 at 2:37 AM, Ewen Cheslack-Postava 
wrote:

> I can't speak to the exact details of why fds would be kept open longer in
> that specific case, but are you aware that the recommendation for
> production clusters for open fd limits is much higher? It's been suggested
> to be 100,000 as a starting point for quite awhile:
> http://kafka.apache.org/documentation.html#os
>
> -Ewen
>
> On Mon, Jan 9, 2017 at 12:45 PM, Stephen Powis 
> wrote:
>
> > Hey!
> >
> > I've run into something concerning in our production clusterI believe
> > I've posted this question to the mailing list previously (
> > http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/browser
> )
> > but the problem has become considerably more serious.
> >
> > We've been fighting issues where Kafka 0.10.0.1 hits its max file
> > descriptor limit.  Our limit is set to ~16k, and under normal operation
> it
> > holds steady around 4k open files.
> >
> > But occasionally Kafka will roll a new log segment, which typically takes
> > on the order of magnitude of a few milliseconds.  However...sometimes it
> > will take a considerable amount of time, any where from 40 seconds up to
> > over a minute.  When this happens, it seems like connections are not
> > released by kafka, and we end up with thousands of client connections
> stuck
> > in CLOSE_WAIT, which pile up and exceed our max file descriptor limit.
> > This happens all in the span of about a minute.
> >
> > Our logs look like this:
> >
> > [2017-01-08 01:10:17,117] INFO Rolled new log segment for 'MyTopic-8' in
> > > 41122 ms. (kafka.log.Log)
> > > [2017-01-08 01:10:32,550] INFO Rolled new log segment for 'MyTopic-4'
> in
> > 1
> > > ms. (kafka.log.Log)
> > > [2017-01-08 01:11:10,039] INFO [Group Metadata Manager on Broker 4]:
> > > Removed 0 expired offsets in 0 milliseconds.
> > > (kafka.coordinator.GroupMetadataManager)
> > > [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> > > (kafka.network.Acceptor)
> > > java.io.IOException: Too many open files   at
> > > sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > >
> > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 422)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 250)
> > > at kafka.network.Acceptor.accept(SocketServer.scala:323)
> > > at kafka.network.Acceptor.run(SocketServer.scala:268)
> > > at java.lang.Thread.run(Thread.java:745)
> > > [2017-01-08 01:19:02,877] ERROR Error while accepting connection
> > > (kafka.network.Acceptor)
> > > java.io.IOException: Too many open files
> > > at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 422)
> > > at
> > > sun.nio.ch.ServerSocketChannelImpl.accept(
> ServerSocketChannelImpl.java:
> > 250)
> > > at kafka.network.Acceptor.accept(SocketServer.scala:323)
> > > at kafka.network.Acceptor.run(SocketServer.scala:268)
> > > at java.lang.Thread.run(Thread.java:745)
> > > .
> > >
> >
> >
> > And then kafka crashes.
> >
> > Has anyone seen this behavior of slow log segmented being rolled?  Any
> > ideas of how to track down what could be causing this?
> >
> > Thanks!
> > Stephen
> >
>


Consumer Balancing multi topic, single partition

2017-01-12 Thread Tobias Adamson
Hi
We have a scenario where we have a lot of single partition topics for ordering 
purposes.
We then want to use multiple consumer processes listening to many topics. 
During testing it seems like one consumer process will always end up with all 
topics/partitions assigned to it and there is no real distribution going on.

Does the assignment process take into account the total number of partitions 
assigned to each consumer process or does it only
check assignment per topic?

Regards
Toby

Re: changelog topic with in-memory state store

2017-01-12 Thread Daisuke Moriya
Thanks for your response Damian.

> However the in-memory store will write each update to the changelog 
> (regardless of context.commit), which seems to be the issue you have?
Yes. I have the issue.
Although I can't say a specific number, it is issue for me, for example,
that Kafka Streams reads hundreds of thousands of EPS source streams and writes 
hundreds of thousands of EPS changelogs on the broker.

> Have you tested it and observed that it puts load on the broker?
Yes. 
I checked the behavior for each state store type in a small test.
I tried the following in the test code with 3 patterns:
  P1:persistent state store without cache /  P2:persistent state store with 
cache / P3:in-memory state store.

1. Set up a local broker
2. Create a source topic for 1 partition
3. Produce 10 records whose value changes with the same key to source topic
4. Execute the Kafka Streams application that includes WordCountProcessor 
(changelog topic is left up to automatic creation by Kafka Streams)
5. Consume the changelog topic from the first offset. 

Since I cleaned the environment every time I tested, 3 patterns of tests were 
independent.
It was confirmed that compaction was not occurred by the broker.
As a result, I observed 10 records from changelog topic in the case where the 
state store is P1:(persistent without cache) and P3(in case of in-memory).
I observed 1 record which is the latest record only when the state store is 
P2(persistent with cache).
I expected that records written to changelog will be reduced even in the case 
of in-memory state store, but in my local test it did not.

My Question:
Is this a normal behavior of Kafka Streams?

Thank you.
--
Daisuke




Hi,

There is no way to enable caching on in-memory-store - by definition it is
already cached. However the in-memory store will write each update to the
changelog (regardless of context.commit), which seems to be the issue you
have?

When you say large, how large? Have you tested it and observed that it puts
load on the broker?

Thanks,
Damian

On Wed, 11 Jan 2017 at 06:10 Daisuke Moriya  wrote:

Hi.

I am developing a simple log counting application using Kafka Streams
0.10.1.1.
Its implementation is almost the same as the WordCountProcessor in the
confluent document [
http://docs.confluent.io/3.1.1/streams/developer-guide.html#processor-api].
I am using in-memory state store,
its key is the ID of log category, value is count.
All the changelogs are written to a broker by context.commit() for fault
tolerance,
but since the data I handle is large and the size of key is large, it takes
a long time to process.
Even if it is compacted with a broker, this will put a load on the broker.
I would like to write only the latest records for each key on the broker
instead of all changelogs at context.commit().
This will reduce the load on the broker and I do not think
there will be any negative impact on fault tolerance.
If I use the persistent state store, I can do this by enabling caching,
but I couldn't find how to accomplish this with the in-memory state store.
Can I do this?

Thank you,
--
Daisuke





Re: Kafka consumer offset info lost

2017-01-12 Thread Mahendra Kariya
Producers were publishing data for the topic. And consumers were also
connected, sending heartbeat pings every 100 ms.



On Thu, 12 Jan 2017 at 17:15 Michael Freeman  wrote:

> If the topic has not seen traffic for a while then Kafka will remove the
> stored offset. When your consumer reconnects Kafka no longer has the offset
> so it will reprocess from earliest.
>
> Michael
>
> > On 12 Jan 2017, at 11:13, Mahendra Kariya 
> wrote:
> >
> > Hey All,
> >
> > We have a Kafka cluster hosted on Google Cloud. There was some network
> > issue on the cloud and suddenly, the offset for a particular consumer
> group
> > got reset to earliest and all of a sudden the lag was in millions. We
> > aren't able to figure out what went wrong. Has anybody faced the
> > same/similar issue? Does anybody have any debugging tips?
> >
> > Some relevant info:
> >
> >   - The auto.offset.reset config for the consumer is set to earliest
> >   - The offsets are stored on Kafka
> >   - Total nodes on cluster: 4
> >   - Replication factor: 3
> >   - Partitions: 50
>


Re: Kafka consumer offset info lost

2017-01-12 Thread Michael Freeman
If the topic has not seen traffic for a while then Kafka will remove the stored 
offset. When your consumer reconnects Kafka no longer has the offset so it will 
reprocess from earliest.

Michael

> On 12 Jan 2017, at 11:13, Mahendra Kariya  wrote:
> 
> Hey All,
> 
> We have a Kafka cluster hosted on Google Cloud. There was some network
> issue on the cloud and suddenly, the offset for a particular consumer group
> got reset to earliest and all of a sudden the lag was in millions. We
> aren't able to figure out what went wrong. Has anybody faced the
> same/similar issue? Does anybody have any debugging tips?
> 
> Some relevant info:
> 
>   - The auto.offset.reset config for the consumer is set to earliest
>   - The offsets are stored on Kafka
>   - Total nodes on cluster: 4
>   - Replication factor: 3
>   - Partitions: 50


Kafka consumer offset info lost

2017-01-12 Thread Mahendra Kariya
Hey All,

We have a Kafka cluster hosted on Google Cloud. There was some network
issue on the cloud and suddenly, the offset for a particular consumer group
got reset to earliest and all of a sudden the lag was in millions. We
aren't able to figure out what went wrong. Has anybody faced the
same/similar issue? Does anybody have any debugging tips?

Some relevant info:

   - The auto.offset.reset config for the consumer is set to earliest
   - The offsets are stored on Kafka
   - Total nodes on cluster: 4
   - Replication factor: 3
   - Partitions: 50


Kafka Streams: consume 6 months old data VS windows maintain durations

2017-01-12 Thread Nicolas Fouché
Hi.


I'd like to re-consume 6 months old data with Kafka Streams.

My current topology can't because it defines aggregations with windows maintain 
durations of 3 days.
TimeWindows.of(ONE_HOUR_MILLIS).until(THREE_DAYS_MILLIS)



As discovered (and shared [1]) a few months ago, consuming a record older than 
3 days will mess up my aggregates. How do you deal with this ? Do you 
temporarily raise the windows maintain durations until all records are consumed 
? Do you always run your topologies with long durations, like a year ? I have 
no idea what would be the impact on the RAM and disk, but I guess RocksDB would 
cry a little.


Final question: il I raise the duration to 6 months, consume my records, and 
then set the duration back to 3 days, would the old aggregates automatically 
destroyed ?


[1] 
http://mail-archives.apache.org/mod_mbox/kafka-users/201610.mbox/%3ccabqkjkj42n7z4bxjdkrdyz_kmpunh738uxvm7gy24dnkx+r...@mail.gmail.com%3e
  

Thanks
Nicolas




Re: [ANNOUNCE] New committer: Grant Henke

2017-01-12 Thread Damian Guy
Congratulations!

On Thu, 12 Jan 2017 at 03:35 Jun Rao  wrote:

> Grant,
>
> Thanks for all your contribution! Congratulations!
>
> Jun
>
> On Wed, Jan 11, 2017 at 2:51 PM, Gwen Shapira  wrote:
>
> > The PMC for Apache Kafka has invited Grant Henke to join as a
> > committer and we are pleased to announce that he has accepted!
> >
> > Grant contributed 88 patches, 90 code reviews, countless great
> > comments on discussions, a much-needed cleanup to our protocol and the
> > on-going and critical work on the Admin protocol. Throughout this, he
> > displayed great technical judgment, high-quality work and willingness
> > to contribute where needed to make Apache Kafka awesome.
> >
> > Thank you for your contributions, Grant :)
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > Follow us: Twitter | blog
> >
>