Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Stephen Powis
Yep, that's exactly what the output is showing. On Fri, Jan 13, 2017 at 11:25 AM, Tauzell, Dave < dave.tauz...@surescripts.com> wrote: > I haven't used dtrace, but is it possible to have it running and recording > the ftruncate64 times? Then when you see one of these long roll times look > at

RE: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Tauzell, Dave
I haven't used dtrace, but is it possible to have it running and recording the ftruncate64 times? Then when you see one of these long roll times look at the dtrace log to see if it was that call? -Dave -Original Message- From: Stephen Powis [mailto:spo...@salesforce.com] Sent: Friday,

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Stephen Powis
So the underlying system call is ftruncate64, logged using dtrace. # BEGIN stack trace for ftruncate, call took 34160541200ns: > args== > 0x7f5f9a1134d7 : ftruncate64+0x7/0x30 [/lib64/libc-2.12.so] > 0x7f5f987b80c3 : 0x7f5f987b80c3 >

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Ismael Juma
Hi Stephen, I think it's worth capturing this in a JIRA so that it doesn't get lost. Ismael On Thu, Jan 12, 2017 at 7:22 PM, Stephen Powis wrote: > I've further narrowed it down to this particular line: >

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-13 Thread Ewen Cheslack-Postava
That line is truncating a file. If the underlying implementation is synchronous, it could potentially require multiple disk writes to remove blocks from the file, update the list of free blocks, etc. That could potentially cause the latency. I'm not sure that's actually where the cost is coming

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephane Maarek
Hi Stephen Out of curiosity, why did you pick ZFS over XFS or ext4 and what options are you using when formatting and mounting? Regards, Stephane On 13 January 2017 at 6:40:18 am, Stephen Powis (spo...@salesforce.com) wrote: Running Centos 6.7 3.10.95-1.el6.elrepo.x86_64. 4 SATA disks in

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Running Centos 6.7 3.10.95-1.el6.elrepo.x86_64. 4 SATA disks in RAID10 with ZFS On Thu, Jan 12, 2017 at 2:27 PM, Tauzell, Dave wrote: > You have a local filesystem? Linux? > > -Dave > > -Original Message- > From: Stephen Powis

RE: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Tauzell, Dave
You have a local filesystem? Linux? -Dave -Original Message- From: Stephen Powis [mailto:spo...@salesforce.com] Sent: Thursday, January 12, 2017 1:22 PM To: users@kafka.apache.org Subject: Re: Taking a long time to roll a new log segment (~1 min) I've further narrowed it down to this

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
I've further narrowed it down to this particular line: https://github.com/apache/kafka/blob/0.10.0/core/src/main/scala/kafka/log/OffsetIndex.scala#L294 But I'm still at a loss to why this would be slow sometimes and not others. On Thu, Jan 12, 2017 at 10:56 AM, Stephen Powis

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Using the little bash script in that JIRA ticket to go thru the GC log and sum up the total pause times I come up with the following. I don't see anything that would indicate a ~28 second pause. 2017-01-12T07:00 = 72.2961 > 2017-01-12T07:01 = 59.8112 > 2017-01-12T07:02 = 69.6915 >

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Just realized that GCEasy doesn't keep reports around for very long anymore, here is a screencap of the report: http://imgur.com/a/MEubD The longest reported GC pause was 30ms, tho they happen somewhat frequently at an average of once per 12 seconds. KAFKA-4616 certainly sounds just like my

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread tao xiao
You may run into this bug https://issues.apache.org/jira/browse/KAFKA-4614 On Thu, 12 Jan 2017 at 23:38 Stephen Powis wrote: > Per my email to the list in Sept, when I reviewed GC logs then, I didn't > see anything out of the ordinary. ( > >

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
Per my email to the list in Sept, when I reviewed GC logs then, I didn't see anything out of the ordinary. ( http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/%3CCABQB-gS7h4Nuq3TKgHoAVeRHPWnBNs2B0Tz0kCjmdB9c0SDcLQ%40mail.gmail.com%3E ) Reviewing the GC logs from this morning around

RE: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Tauzell, Dave
Can you collect garbage collection stats and verify there isn't a long GC happening at the same time? -Dave -Original Message- From: Stephen Powis [mailto:spo...@salesforce.com] Sent: Thursday, January 12, 2017 8:34 AM To: users@kafka.apache.org Subject: Re: Taking a long time to roll a

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-12 Thread Stephen Powis
So per the kafka docs I up'd our FD limit to 100k, and we are no longer seeing the process die, which is good. Unfortunately we're still seeing very high log segment roll times, and I'm unsure if this is considered 'normal', as it tends to block producers during this period. We are running kafka

Re: Taking a long time to roll a new log segment (~1 min)

2017-01-09 Thread Ewen Cheslack-Postava
I can't speak to the exact details of why fds would be kept open longer in that specific case, but are you aware that the recommendation for production clusters for open fd limits is much higher? It's been suggested to be 100,000 as a starting point for quite awhile:

Taking a long time to roll a new log segment (~1 min)

2017-01-09 Thread Stephen Powis
Hey! I've run into something concerning in our production clusterI believe I've posted this question to the mailing list previously ( http://mail-archives.apache.org/mod_mbox/kafka-users/201609.mbox/browser) but the problem has become considerably more serious. We've been fighting issues