Re: Frequent Consumer and Producer Disconnects
Topic creation should only cause a rebalance for wildcard consumers (and I believe that is regardless of whether or not the wildcard covers the topic - once the ZK watch fires a rebalance is going to happen). Back to the original concern, it would be helpful to see more of that log, in that case. When a rebalance is triggered, there will be a log message that will indicate why. This is going to be caused by a change in the group membership (which has a number of causes, but at least it narrows it down) or a topic change. Figuring out why the consumers are rebalancing is the first step to trying to reduce it. -Todd On Saturday, September 26, 2015, noah wrote: > Thanks, that gives us some more to look at. > > That is unfortunately a small section of the log file. When we hit this > problem (which is not every time,) it will continue like that for hours. > > We also still have developers creating topics semi-regularly, which it > seems like can cause the high level consumer to disconnect? > > > On Fri, Sep 25, 2015 at 6:16 PM Todd Palino > wrote: > >> That rebalance cycle doesn't look endless. I see that you started 23 >> consumers, and I see 23 rebalances finishing successfully, which is >> correct. You will see rebalance messages from all of the consumers you >> started. It all happens within about 2 seconds, which is fine. I agree that >> there is a lot of log messages, but I'm not seeing anything that is >> particularly a problem here. After the segment of pot you provided, your >> consumers will be running properly. Now, given you have a topic with 16 >> partitions, and you're running 23 consumers, 7 of those consumer threads >> are going to be idle because they do not own partitions. >> >> -Todd >> >> >> On Fri, Sep 25, 2015 at 3:27 PM, noah > > wrote: >> >>> We're seeing this the most on developer machines that are starting up >>> multiple high level consumers on the same topic+group as part of service >>> startup. The consumers do not seem to get a chance to consume anything >>> before they disconnect. >>> >>> These are developer topics, so it is possible/likely that there isn't >>> anything for them to consume in the topic, but the same service will start >>> producing, so I would expect them to not be idle for long. >>> >>> Could it be the way we are bring up multiple consumers at the same time >>> is hitting some sort of endless rebalance cycle? And/or the resulting >>> thrashing is causing them to time out, rebalance, etc.? >>> >>> I've tried attaching the logs again. Thanks! >>> >>> On Fri, Sep 25, 2015 at 3:33 PM Todd Palino >> > wrote: >>> I don't see the logs attached, but what does the GC look like in your applications? A lot of times this is caused (at least on the consumer side) by the Zookeeper session expiring due to excessive GC activity, which causes the consumers to go into a rebalance and change up their connections. -Todd On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira >>> > wrote: > How busy are the clients? > > The brokers occasionally close idle connections, this is normal and > typically not something to worry about. > However, this shouldn't happen to consumers that are actively reading data. > > I'm wondering if the "consumers not making any progress" could be due to a > different issue, and because they are idle, the connection closes (vs the > other way around). > > On Thu, Sep 24, 2015 at 2:32 PM, noah >>> > wrote: > > > We are having issues with producers and consumers frequently fully > > disconnecting (from both the brokers and ZK) and reconnecting without any > > apparent cause. On our production systems it can happen anywhere from > every > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and > > developer laptops, it can happen almost constantly. > > > > We see no errors in the logs (sample attached), just a message for each > of > > our our consumers and producers disconnecting, then reconnecting. On the > > systems where it happens constantly, the consumers are not making any > > progress. > > > > The logs on the brokers are equally unhelpful, they show only frequent > > connects and reconnects, without any apparent cause. > > > > What could be causing this behavior? > > > > > >>> >>
Re: Frequent Consumer and Producer Disconnects
Thanks, that gives us some more to look at. That is unfortunately a small section of the log file. When we hit this problem (which is not every time,) it will continue like that for hours. We also still have developers creating topics semi-regularly, which it seems like can cause the high level consumer to disconnect? On Fri, Sep 25, 2015 at 6:16 PM Todd Palino wrote: > That rebalance cycle doesn't look endless. I see that you started 23 > consumers, and I see 23 rebalances finishing successfully, which is > correct. You will see rebalance messages from all of the consumers you > started. It all happens within about 2 seconds, which is fine. I agree that > there is a lot of log messages, but I'm not seeing anything that is > particularly a problem here. After the segment of pot you provided, your > consumers will be running properly. Now, given you have a topic with 16 > partitions, and you're running 23 consumers, 7 of those consumer threads > are going to be idle because they do not own partitions. > > -Todd > > > On Fri, Sep 25, 2015 at 3:27 PM, noah wrote: > >> We're seeing this the most on developer machines that are starting up >> multiple high level consumers on the same topic+group as part of service >> startup. The consumers do not seem to get a chance to consume anything >> before they disconnect. >> >> These are developer topics, so it is possible/likely that there isn't >> anything for them to consume in the topic, but the same service will start >> producing, so I would expect them to not be idle for long. >> >> Could it be the way we are bring up multiple consumers at the same time >> is hitting some sort of endless rebalance cycle? And/or the resulting >> thrashing is causing them to time out, rebalance, etc.? >> >> I've tried attaching the logs again. Thanks! >> >> On Fri, Sep 25, 2015 at 3:33 PM Todd Palino wrote: >> >>> I don't see the logs attached, but what does the GC look like in your >>> applications? A lot of times this is caused (at least on the consumer >>> side) >>> by the Zookeeper session expiring due to excessive GC activity, which >>> causes the consumers to go into a rebalance and change up their >>> connections. >>> >>> -Todd >>> >>> >>> On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira wrote: >>> >>> > How busy are the clients? >>> > >>> > The brokers occasionally close idle connections, this is normal and >>> > typically not something to worry about. >>> > However, this shouldn't happen to consumers that are actively reading >>> data. >>> > >>> > I'm wondering if the "consumers not making any progress" could be due >>> to a >>> > different issue, and because they are idle, the connection closes (vs >>> the >>> > other way around). >>> > >>> > On Thu, Sep 24, 2015 at 2:32 PM, noah wrote: >>> > >>> > > We are having issues with producers and consumers frequently fully >>> > > disconnecting (from both the brokers and ZK) and reconnecting >>> without any >>> > > apparent cause. On our production systems it can happen anywhere from >>> > every >>> > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and >>> > > developer laptops, it can happen almost constantly. >>> > > >>> > > We see no errors in the logs (sample attached), just a message for >>> each >>> > of >>> > > our our consumers and producers disconnecting, then reconnecting. On >>> the >>> > > systems where it happens constantly, the consumers are not making any >>> > > progress. >>> > > >>> > > The logs on the brokers are equally unhelpful, they show only >>> frequent >>> > > connects and reconnects, without any apparent cause. >>> > > >>> > > What could be causing this behavior? >>> > > >>> > > >>> > >>> >> >
Re: Frequent Consumer and Producer Disconnects
That rebalance cycle doesn't look endless. I see that you started 23 consumers, and I see 23 rebalances finishing successfully, which is correct. You will see rebalance messages from all of the consumers you started. It all happens within about 2 seconds, which is fine. I agree that there is a lot of log messages, but I'm not seeing anything that is particularly a problem here. After the segment of pot you provided, your consumers will be running properly. Now, given you have a topic with 16 partitions, and you're running 23 consumers, 7 of those consumer threads are going to be idle because they do not own partitions. -Todd On Fri, Sep 25, 2015 at 3:27 PM, noah wrote: > We're seeing this the most on developer machines that are starting up > multiple high level consumers on the same topic+group as part of service > startup. The consumers do not seem to get a chance to consume anything > before they disconnect. > > These are developer topics, so it is possible/likely that there isn't > anything for them to consume in the topic, but the same service will start > producing, so I would expect them to not be idle for long. > > Could it be the way we are bring up multiple consumers at the same time is > hitting some sort of endless rebalance cycle? And/or the resulting > thrashing is causing them to time out, rebalance, etc.? > > I've tried attaching the logs again. Thanks! > > On Fri, Sep 25, 2015 at 3:33 PM Todd Palino wrote: > >> I don't see the logs attached, but what does the GC look like in your >> applications? A lot of times this is caused (at least on the consumer >> side) >> by the Zookeeper session expiring due to excessive GC activity, which >> causes the consumers to go into a rebalance and change up their >> connections. >> >> -Todd >> >> >> On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira wrote: >> >> > How busy are the clients? >> > >> > The brokers occasionally close idle connections, this is normal and >> > typically not something to worry about. >> > However, this shouldn't happen to consumers that are actively reading >> data. >> > >> > I'm wondering if the "consumers not making any progress" could be due >> to a >> > different issue, and because they are idle, the connection closes (vs >> the >> > other way around). >> > >> > On Thu, Sep 24, 2015 at 2:32 PM, noah wrote: >> > >> > > We are having issues with producers and consumers frequently fully >> > > disconnecting (from both the brokers and ZK) and reconnecting without >> any >> > > apparent cause. On our production systems it can happen anywhere from >> > every >> > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and >> > > developer laptops, it can happen almost constantly. >> > > >> > > We see no errors in the logs (sample attached), just a message for >> each >> > of >> > > our our consumers and producers disconnecting, then reconnecting. On >> the >> > > systems where it happens constantly, the consumers are not making any >> > > progress. >> > > >> > > The logs on the brokers are equally unhelpful, they show only frequent >> > > connects and reconnects, without any apparent cause. >> > > >> > > What could be causing this behavior? >> > > >> > > >> > >> >
Re: Frequent Consumer and Producer Disconnects
I don't see the logs attached, but what does the GC look like in your applications? A lot of times this is caused (at least on the consumer side) by the Zookeeper session expiring due to excessive GC activity, which causes the consumers to go into a rebalance and change up their connections. -Todd On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira wrote: > How busy are the clients? > > The brokers occasionally close idle connections, this is normal and > typically not something to worry about. > However, this shouldn't happen to consumers that are actively reading data. > > I'm wondering if the "consumers not making any progress" could be due to a > different issue, and because they are idle, the connection closes (vs the > other way around). > > On Thu, Sep 24, 2015 at 2:32 PM, noah wrote: > > > We are having issues with producers and consumers frequently fully > > disconnecting (from both the brokers and ZK) and reconnecting without any > > apparent cause. On our production systems it can happen anywhere from > every > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and > > developer laptops, it can happen almost constantly. > > > > We see no errors in the logs (sample attached), just a message for each > of > > our our consumers and producers disconnecting, then reconnecting. On the > > systems where it happens constantly, the consumers are not making any > > progress. > > > > The logs on the brokers are equally unhelpful, they show only frequent > > connects and reconnects, without any apparent cause. > > > > What could be causing this behavior? > > > > >
Re: Frequent Consumer and Producer Disconnects
How busy are the clients? The brokers occasionally close idle connections, this is normal and typically not something to worry about. However, this shouldn't happen to consumers that are actively reading data. I'm wondering if the "consumers not making any progress" could be due to a different issue, and because they are idle, the connection closes (vs the other way around). On Thu, Sep 24, 2015 at 2:32 PM, noah wrote: > We are having issues with producers and consumers frequently fully > disconnecting (from both the brokers and ZK) and reconnecting without any > apparent cause. On our production systems it can happen anywhere from every > 10-15 seconds to 15-20 minutes. On our less beefy test systems and > developer laptops, it can happen almost constantly. > > We see no errors in the logs (sample attached), just a message for each of > our our consumers and producers disconnecting, then reconnecting. On the > systems where it happens constantly, the consumers are not making any > progress. > > The logs on the brokers are equally unhelpful, they show only frequent > connects and reconnects, without any apparent cause. > > What could be causing this behavior? > >