回复: Please help! ZooKeeper 3.7.2 fsync-ing latency issue
Hello Patrick, Thank you for your replies. I have read that archived mail, and tried to tracing fsync calls from ZooKeeper to see if there is anything strange with fsync. And also I will try GC tracking later. (It's a shame that I have no idea on how to do this, need to do some searches first) Best regards, Bill 发件人: Patrick Hunt 发送时间: 2024年4月19日 0:38 收件人: user@zookeeper.apache.org 主题: Re: Please help! ZooKeeper 3.7.2 fsync-ing latency issue On Thu, Apr 18, 2024 at 9:15 AM Patrick Hunt wrote: > My experience with slow fsyncs is that it's almost always due to > contention for disk IO. I see that you tuned the snap* sizes down, which is > reasonable. You might check what ZK activity is happening during this > period? Perhaps some client is hammering the cluster, have you ruled > that out? > > Actually one other thing (sorry - it's been a while since I have seen this) could be GC activity. If something (eg my point about client activity due to some periodic event...) causes lots of memory pressure, perhaps the GC is somehow impacting the fsync (or the activity around the fsync). Have you tried running with GC tracking and see if that's related to the event? Patrick > I searched the mail archives, there are other folks reporting this issue, > you might take a look. I found this one in particular that you might > checkout: > https://lists.apache.org/thread/qjrlprmt7pdy63ztvjtvkd0f5zgw5dgk > > Patrick > > On Thu, Apr 18, 2024 at 3:31 AM Xu Bill wrote: > >> Hello, >> >> I have a pretty weird issue of ZooKeeper. >> Everyday around 17:30, my ZooKeeper throws a warning message in log says >> "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will >> adversely effect operation latency.File size is 16777232 bytes.". And this >> causes my clients connected to ZooKeeper being timed out. I have to restart >> my clients every day. >> >> Though I don't think the size of the txn log file is too big to be >> handled quickly, >> still I tried to change parameters to supress the size of txn log. Below >> is my configuration. >> preAllocSize=16M >> snapCount=3 >> snapSizeLimitInKb=32M >> >> Even with this configuration, I still got the warnings. >> >> I also tried to monitor the IO stats on data disk which the data dir of >> ZooKeeper is in. >> But the stats were as the same as usual. >> >> Can anybody help give suggestions on how to solve or investigate on this >> issue? >> I am using ZooKeeper 3.7.2. >> The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the >> warning was happening. >> >> Best regards, >> Bill >> >
Re: Please help! ZooKeeper 3.7.2 fsync-ing latency issue
On Thu, Apr 18, 2024 at 9:15 AM Patrick Hunt wrote: > My experience with slow fsyncs is that it's almost always due to > contention for disk IO. I see that you tuned the snap* sizes down, which is > reasonable. You might check what ZK activity is happening during this > period? Perhaps some client is hammering the cluster, have you ruled > that out? > > Actually one other thing (sorry - it's been a while since I have seen this) could be GC activity. If something (eg my point about client activity due to some periodic event...) causes lots of memory pressure, perhaps the GC is somehow impacting the fsync (or the activity around the fsync). Have you tried running with GC tracking and see if that's related to the event? Patrick > I searched the mail archives, there are other folks reporting this issue, > you might take a look. I found this one in particular that you might > checkout: > https://lists.apache.org/thread/qjrlprmt7pdy63ztvjtvkd0f5zgw5dgk > > Patrick > > On Thu, Apr 18, 2024 at 3:31 AM Xu Bill wrote: > >> Hello, >> >> I have a pretty weird issue of ZooKeeper. >> Everyday around 17:30, my ZooKeeper throws a warning message in log says >> "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will >> adversely effect operation latency.File size is 16777232 bytes.". And this >> causes my clients connected to ZooKeeper being timed out. I have to restart >> my clients every day. >> >> Though I don't think the size of the txn log file is too big to be >> handled quickly, >> still I tried to change parameters to supress the size of txn log. Below >> is my configuration. >> preAllocSize=16M >> snapCount=3 >> snapSizeLimitInKb=32M >> >> Even with this configuration, I still got the warnings. >> >> I also tried to monitor the IO stats on data disk which the data dir of >> ZooKeeper is in. >> But the stats were as the same as usual. >> >> Can anybody help give suggestions on how to solve or investigate on this >> issue? >> I am using ZooKeeper 3.7.2. >> The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the >> warning was happening. >> >> Best regards, >> Bill >> >
Re: Please help! ZooKeeper 3.7.2 fsync-ing latency issue
My experience with slow fsyncs is that it's almost always due to contention for disk IO. I see that you tuned the snap* sizes down, which is reasonable. You might check what ZK activity is happening during this period? Perhaps some client is hammering the cluster, have you ruled that out? I searched the mail archives, there are other folks reporting this issue, you might take a look. I found this one in particular that you might checkout: https://lists.apache.org/thread/qjrlprmt7pdy63ztvjtvkd0f5zgw5dgk Patrick On Thu, Apr 18, 2024 at 3:31 AM Xu Bill wrote: > Hello, > > I have a pretty weird issue of ZooKeeper. > Everyday around 17:30, my ZooKeeper throws a warning message in log says > "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will > adversely effect operation latency.File size is 16777232 bytes.". And this > causes my clients connected to ZooKeeper being timed out. I have to restart > my clients every day. > > Though I don't think the size of the txn log file is too big to be handled > quickly, > still I tried to change parameters to supress the size of txn log. Below > is my configuration. > preAllocSize=16M > snapCount=3 > snapSizeLimitInKb=32M > > Even with this configuration, I still got the warnings. > > I also tried to monitor the IO stats on data disk which the data dir of > ZooKeeper is in. > But the stats were as the same as usual. > > Can anybody help give suggestions on how to solve or investigate on this > issue? > I am using ZooKeeper 3.7.2. > The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the warning > was happening. > > Best regards, > Bill >
Please help! ZooKeeper 3.7.2 fsync-ing latency issue
Hello, I have a pretty weird issue of ZooKeeper. Everyday around 17:30, my ZooKeeper throws a warning message in log says "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will adversely effect operation latency.File size is 16777232 bytes.". And this causes my clients connected to ZooKeeper being timed out. I have to restart my clients every day. Though I don't think the size of the txn log file is too big to be handled quickly, still I tried to change parameters to supress the size of txn log. Below is my configuration. preAllocSize=16M snapCount=3 snapSizeLimitInKb=32M Even with this configuration, I still got the warnings. I also tried to monitor the IO stats on data disk which the data dir of ZooKeeper is in. But the stats were as the same as usual. Can anybody help give suggestions on how to solve or investigate on this issue? I am using ZooKeeper 3.7.2. The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the warning was happening. Best regards, Bill