Re: Cassandra Bootstrap Sequence
Just did some more debugging it looks like the "nodetool compactionstats" which is hung/taking time during this period causing the delay in metrics. I still puzzled why the nodetool compactionstats commands takes longer on all the nodes at the same time, when one node is being restarted $ time nodetool compactionstats > pending tasks: 0 > > real 1m17.559s > user 0m2.340s > sys 0m0.248s On Tue, Jun 2, 2020 at 10:25 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Also during this time, I am losing metrics for all the nodes in the > cluster (metrics agent is timing out collecting within 10s) and recovers > once the node starts the CQL port. Is there any known issue which could > cause this? In my case the delay between Gossip settle and CQL port open is > 3 minutes, metrics were lost for all the nodes during the 3 minute period. > > On Tue, Jun 2, 2020 at 7:55 AM Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Thank you, >> >> Does that mean there is no way to improve this delay? And i have to live >> with it since i have more tables? >> >> On Tuesday, June 2, 2020, Durity, Sean R >> wrote: >> >>> As I understand it, Cassandra clusters should be limited to a number of >>> tables in the low hundreds (under 200), at most. What you are seeing is the >>> carving up of memtables for each of those 3,000. I try to limit my clusters >>> to roughly 100 tables. >>> >>> >>> >>> >>> >>> Sean Durity >>> >>> >>> >>> *From:* Jai Bheemsen Rao Dhanwada >>> *Sent:* Tuesday, June 2, 2020 10:48 AM >>> *To:* user@cassandra.apache.org >>> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence >>> >>> >>> >>> 3000 tables >>> >>> On Tuesday, June 2, 2020, Durity, Sean R >>> wrote: >>> >>> How many total tables in the cluster? >>> >>> >>> >>> >>> >>> Sean Durity >>> >>> >>> >>> *From:* Jai Bheemsen Rao Dhanwada >>> *Sent:* Monday, June 1, 2020 8:36 PM >>> *To:* user@cassandra.apache.org >>> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence >>> >>> >>> >>> Thanks Erick, >>> >>> >>> >>> I see below tasks are being run mostly. I didn't quite understand what >>> exactly these scheduled tasks are for? Is there a way to reduce the boot-up >>> time or do I have to live with this delay? >>> >>> >>> >>> $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction >>> strategy" debug.log* | wc -l >>> 3249 >>> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache >>> for" debug.log* | wc -l >>> 6293 >>> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | >>> wc -l >>> 6308 >>> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from >>> DiskBoundaries" debug.log* | wc -l >>> 3249 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez >>> wrote: >>> >>> There's quite a lot of steps that takes place during the startup >>> sequence between these 2 lines: >>> >>> >>> >>> >>> *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip >>> backlog; proceeding *INFO [main] 2020-05-31 23:54:06,867 >>> NativeTransportService.java:70 - Netty using native Epoll event loop >>> >>> >>> >>> For the most part, it's taken up by CompactionStrategyManager and >>> DiskBoundaryManager. If you check debug.log, you'll see that it's >>> mostly updating disk boundaries. The length of time it takes is >>> proportional to the number of tables in the cluster. >>> >>> >>> >>> Have a look at this section [1] of CassandraDaemon if you're interested >>> in the details of the startup sequence. Cheers! >>> >>> >>> >>> [1] >>> https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 >>> [github.com] >>> <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPH
Re: Cassandra Bootstrap Sequence
Also during this time, I am losing metrics for all the nodes in the cluster (metrics agent is timing out collecting within 10s) and recovers once the node starts the CQL port. Is there any known issue which could cause this? In my case the delay between Gossip settle and CQL port open is 3 minutes, metrics were lost for all the nodes during the 3 minute period. On Tue, Jun 2, 2020 at 7:55 AM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thank you, > > Does that mean there is no way to improve this delay? And i have to live > with it since i have more tables? > > On Tuesday, June 2, 2020, Durity, Sean R > wrote: > >> As I understand it, Cassandra clusters should be limited to a number of >> tables in the low hundreds (under 200), at most. What you are seeing is the >> carving up of memtables for each of those 3,000. I try to limit my clusters >> to roughly 100 tables. >> >> >> >> >> >> Sean Durity >> >> >> >> *From:* Jai Bheemsen Rao Dhanwada >> *Sent:* Tuesday, June 2, 2020 10:48 AM >> *To:* user@cassandra.apache.org >> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence >> >> >> >> 3000 tables >> >> On Tuesday, June 2, 2020, Durity, Sean R >> wrote: >> >> How many total tables in the cluster? >> >> >> >> >> >> Sean Durity >> >> >> >> *From:* Jai Bheemsen Rao Dhanwada >> *Sent:* Monday, June 1, 2020 8:36 PM >> *To:* user@cassandra.apache.org >> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence >> >> >> >> Thanks Erick, >> >> >> >> I see below tasks are being run mostly. I didn't quite understand what >> exactly these scheduled tasks are for? Is there a way to reduce the boot-up >> time or do I have to live with this delay? >> >> >> >> $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction >> strategy" debug.log* | wc -l >> 3249 >> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache >> for" debug.log* | wc -l >> 6293 >> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc >> -l >> 6308 >> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from >> DiskBoundaries" debug.log* | wc -l >> 3249 >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez >> wrote: >> >> There's quite a lot of steps that takes place during the startup sequence >> between these 2 lines: >> >> >> >> >> *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip >> backlog; proceeding *INFO [main] 2020-05-31 23:54:06,867 >> NativeTransportService.java:70 - Netty using native Epoll event loop >> >> >> >> For the most part, it's taken up by CompactionStrategyManager and >> DiskBoundaryManager. If you check debug.log, you'll see that it's mostly >> updating disk boundaries. The length of time it takes is proportional to >> the number of tables in the cluster. >> >> >> >> Have a look at this section [1] of CassandraDaemon if you're interested >> in the details of the startup sequence. Cheers! >> >> >> >> [1] >> https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 >> [github.com] >> <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$> >> >> >> -- >> >> >> The information in this Internet Email is confidential and may be legally >> privileged. It is intended solely for the addressee. Access to this Email >> by anyone else is unauthorized. If you are not the intended recipient, any >> disclosure, copying, distribution or any action taken or omitted to be >> taken in reliance on it, is prohibited and may be unlawful. When addressed >> to our clients any opinions or advice contained in this Email are subject >> to the terms and conditions expressed in any applicable governing The Home >> Depot terms of business or client engagement letter. The Home Depot >> disclaims all responsibility and liability for the accuracy and content of >> this attachment and for any damages or losses arising from any >> inaccuracies, errors, viruses, e.g.,
Re: Cassandra Bootstrap Sequence
Thank you, Does that mean there is no way to improve this delay? And i have to live with it since i have more tables? On Tuesday, June 2, 2020, Durity, Sean R wrote: > As I understand it, Cassandra clusters should be limited to a number of > tables in the low hundreds (under 200), at most. What you are seeing is the > carving up of memtables for each of those 3,000. I try to limit my clusters > to roughly 100 tables. > > > > > > Sean Durity > > > > *From:* Jai Bheemsen Rao Dhanwada > *Sent:* Tuesday, June 2, 2020 10:48 AM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence > > > > 3000 tables > > On Tuesday, June 2, 2020, Durity, Sean R > wrote: > > How many total tables in the cluster? > > > > > > Sean Durity > > > > *From:* Jai Bheemsen Rao Dhanwada > *Sent:* Monday, June 1, 2020 8:36 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence > > > > Thanks Erick, > > > > I see below tasks are being run mostly. I didn't quite understand what > exactly these scheduled tasks are for? Is there a way to reduce the boot-up > time or do I have to live with this delay? > > > > $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction > strategy" debug.log* | wc -l > 3249 > $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" > debug.log* | wc -l > 6293 > $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc > -l > 6308 > $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from > DiskBoundaries" debug.log* | wc -l > 3249 > > > > > > > > > > > > On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez > wrote: > > There's quite a lot of steps that takes place during the startup sequence > between these 2 lines: > > > > > *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip > backlog; proceeding *INFO [main] 2020-05-31 23:54:06,867 > NativeTransportService.java:70 - Netty using native Epoll event loop > > > > For the most part, it's taken up by CompactionStrategyManager and > DiskBoundaryManager. If you check debug.log, you'll see that it's mostly > updating disk boundaries. The length of time it takes is proportional to > the number of tables in the cluster. > > > > Have a look at this section [1] of CassandraDaemon if you're interested > in the details of the startup sequence. Cheers! > > > > [1] https://github.com/apache/cassandra/blob/cassandra-3.11. > 3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 > [github.com] > <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$> > > > -- > > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: Cassandra Bootstrap Sequence
As I understand it, Cassandra clusters should be limited to a number of tables in the low hundreds (under 200), at most. What you are seeing is the carving up of memtables for each of those 3,000. I try to limit my clusters to roughly 100 tables. Sean Durity From: Jai Bheemsen Rao Dhanwada Sent: Tuesday, June 2, 2020 10:48 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence 3000 tables On Tuesday, June 2, 2020, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: How many total tables in the cluster? Sean Durity From: Jai Bheemsen Rao Dhanwada mailto:jaibheem...@gmail.com>> Sent: Monday, June 1, 2020 8:36 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence Thanks Erick, I see below tasks are being run mostly. I didn't quite understand what exactly these scheduled tasks are for? Is there a way to reduce the boot-up time or do I have to live with this delay? $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" debug.log* | wc -l 3249 $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" debug.log* | wc -l 6293 $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc -l 6308 $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" debug.log* | wc -l 3249 On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez mailto:erick.rami...@datastax.com>> wrote: There's quite a lot of steps that takes place during the startup sequence between these 2 lines: INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop For the most part, it's taken up by CompactionStrategyManager and DiskBoundaryManager. If you check debug.log, you'll see that it's mostly updating disk boundaries. The length of time it takes is proportional to the number of tables in the cluster. Have a look at this section [1] of CassandraDaemon if you're interested in the details of the startup sequence. Cheers! [1] https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 [github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment. The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Cassandra Bootstrap Sequence
Would updating disk boundaries be sensitive to disk I/O tuning? I’m remembering Jon Haddad’s talk about typical throughput problems in disk page sizing. From: Jai Bheemsen Rao Dhanwada Reply-To: "user@cassandra.apache.org" Date: Tuesday, June 2, 2020 at 10:48 AM To: "user@cassandra.apache.org" Subject: Re: Cassandra Bootstrap Sequence Message from External Sender 3000 tables On Tuesday, June 2, 2020, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: How many total tables in the cluster? Sean Durity From: Jai Bheemsen Rao Dhanwada mailto:jaibheem...@gmail.com>> Sent: Monday, June 1, 2020 8:36 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence Thanks Erick, I see below tasks are being run mostly. I didn't quite understand what exactly these scheduled tasks are for? Is there a way to reduce the boot-up time or do I have to live with this delay? $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" debug.log* | wc -l 3249 $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" debug.log* | wc -l 6293 $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc -l 6308 $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" debug.log* | wc -l 3249 On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez mailto:erick.rami...@datastax.com>> wrote: There's quite a lot of steps that takes place during the startup sequence between these 2 lines: INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop For the most part, it's taken up by CompactionStrategyManager and DiskBoundaryManager. If you check debug.log, you'll see that it's mostly updating disk boundaries. The length of time it takes is proportional to the number of tables in the cluster. Have a look at this section [1] of CassandraDaemon if you're interested in the details of the startup sequence. Cheers! [1] https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 [github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Cassandra Bootstrap Sequence
3000 tables On Tuesday, June 2, 2020, Durity, Sean R wrote: > How many total tables in the cluster? > > > > > > Sean Durity > > > > *From:* Jai Bheemsen Rao Dhanwada > *Sent:* Monday, June 1, 2020 8:36 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence > > > > Thanks Erick, > > > > I see below tasks are being run mostly. I didn't quite understand what > exactly these scheduled tasks are for? Is there a way to reduce the boot-up > time or do I have to live with this delay? > > > > $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction > strategy" debug.log* | wc -l > 3249 > $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" > debug.log* | wc -l > 6293 > $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc > -l > 6308 > $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from > DiskBoundaries" debug.log* | wc -l > 3249 > > > > > > > > > > > > On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez > wrote: > > There's quite a lot of steps that takes place during the startup sequence > between these 2 lines: > > > > > *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip > backlog; proceeding *INFO [main] 2020-05-31 23:54:06,867 > NativeTransportService.java:70 - Netty using native Epoll event loop > > > > For the most part, it's taken up by CompactionStrategyManager and > DiskBoundaryManager. If you check debug.log, you'll see that it's mostly > updating disk boundaries. The length of time it takes is proportional to > the number of tables in the cluster. > > > > Have a look at this section [1] of CassandraDaemon if you're interested > in the details of the startup sequence. Cheers! > > > > [1] https://github.com/apache/cassandra/blob/cassandra-3.11. > 3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 > [github.com] > <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$> > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: Cassandra Bootstrap Sequence
How many total tables in the cluster? Sean Durity From: Jai Bheemsen Rao Dhanwada Sent: Monday, June 1, 2020 8:36 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence Thanks Erick, I see below tasks are being run mostly. I didn't quite understand what exactly these scheduled tasks are for? Is there a way to reduce the boot-up time or do I have to live with this delay? $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" debug.log* | wc -l 3249 $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" debug.log* | wc -l 6293 $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc -l 6308 $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" debug.log* | wc -l 3249 On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez mailto:erick.rami...@datastax.com>> wrote: There's quite a lot of steps that takes place during the startup sequence between these 2 lines: INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop For the most part, it's taken up by CompactionStrategyManager and DiskBoundaryManager. If you check debug.log, you'll see that it's mostly updating disk boundaries. The length of time it takes is proportional to the number of tables in the cluster. Have a look at this section [1] of CassandraDaemon if you're interested in the details of the startup sequence. Cheers! [1] https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 [github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Cassandra Bootstrap Sequence
Thanks Erick, I see below tasks are being run mostly. I didn't quite understand what exactly these scheduled tasks are for? Is there a way to reduce the boot-up time or do I have to live with this delay? $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction > strategy" debug.log* | wc -l > 3249 > $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" > debug.log* | wc -l > 6293 > $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log* | wc > -l > 6308 > $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from > DiskBoundaries" debug.log* | wc -l > 3249 On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez wrote: > There's quite a lot of steps that takes place during the startup sequence > between these 2 lines: > > >>> *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip >>> backlog; proceeding*INFO [main] 2020-05-31 23:54:06,867 >>> NativeTransportService.java:70 - Netty using native Epoll event loop >>> >> > For the most part, it's taken up by CompactionStrategyManager and > DiskBoundaryManager. If you check debug.log, you'll see that it's mostly > updating disk boundaries. The length of time it takes is proportional to > the number of tables in the cluster. > > Have a look at this section [1] of CassandraDaemon if you're interested > in the details of the startup sequence. Cheers! > > [1] > https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435 >
Re: Cassandra Bootstrap Sequence
There's quite a lot of steps that takes place during the startup sequence between these 2 lines: >> *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip >> backlog; proceeding*INFO [main] 2020-05-31 23:54:06,867 >> NativeTransportService.java:70 - Netty using native Epoll event loop >> > For the most part, it's taken up by CompactionStrategyManager and DiskBoundaryManager. If you check debug.log, you'll see that it's mostly updating disk boundaries. The length of time it takes is proportional to the number of tables in the cluster. Have a look at this section [1] of CassandraDaemon if you're interested in the details of the startup sequence. Cheers! [1] https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
Re: Cassandra Bootstrap Sequence
The thing to look for in GC logs would be signs that you’re bouncing against your memory limits and spending a lot of time in full GC collections. I’m not sure at what phase it kicks in but definitely there is the potential for memory issues when you have large column families (large in the number of columns I mean), and you’re mentioning that the situation gets worse in proportion to the number of tables brought GC to mind. Not sure about proportion of nodes, I think there are thread counts that increase with the number of nodes, and increased threads also can add to GC load, particularly in G1GC. I’m speculating a bit on possible causes, but basically the idea was to look for GC load during those 3 minutes, because if you see it then you’re not hunting for a timeout tuning or anything like that, you’re hunting for a resource allocation tuning. From: Jai Bheemsen Rao Dhanwada Reply-To: "user@cassandra.apache.org" Date: Monday, June 1, 2020 at 7:15 PM To: "user@cassandra.apache.org" Subject: Re: Cassandra Bootstrap Sequence Message from External Sender Is there anything specific to for in GC logs? b/w this delay happens always whenever I bootstrap the node or restart a C* process. I don't believe it's a GC issue and correction from initial question, it's not just bootstrap, but every restart of C* process is causing this. On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback mailto:rpinchb...@tripadvisor.com>> wrote: That gap seems a long time. Have you checked GC logs around the timeframe? From: Jai Bheemsen Rao Dhanwada mailto:jaibheem...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Monday, June 1, 2020 at 3:52 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Cassandra Bootstrap Sequence Message from External Sender Hello Team, When I am bootstrapping/restarting a Cassandra Node, there is a delay between gossip settle and port opening. Can someone please explain me where this delay is configured and can this be changed? I don't see any information in the logs In my case if you see there is a ~3 minutes delay and this increases if I increase the #of tables and #of nodes and DC. INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to settle... INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for CQL clients on /x.x.x.x:9042 (encrypted)... Also during this 3 minutes delay, I am losing all my metrics from the C* nodes(basically the metrics are not returned within 10s). Can someone please help me understand the delay here? Cassandra Version: 3.11.3 Metrics: Using telegraf to collect metrics.
Re: Cassandra Bootstrap Sequence
Is there anything specific to for in GC logs? b/w this delay happens always whenever I bootstrap the node or restart a C* process. I don't believe it's a GC issue and correction from initial question, it's not just bootstrap, but every restart of C* process is causing this. On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback wrote: > That gap seems a long time. Have you checked GC logs around the timeframe? > > > > *From: *Jai Bheemsen Rao Dhanwada > *Reply-To: *"user@cassandra.apache.org" > *Date: *Monday, June 1, 2020 at 3:52 PM > *To: *"user@cassandra.apache.org" > *Subject: *Cassandra Bootstrap Sequence > > > > *Message from External Sender* > > Hello Team, > > > > When I am bootstrapping/restarting a Cassandra Node, there is a delay > between gossip settle and port opening. Can someone please explain me where > this delay is configured and can this be changed? I don't see any > information in the logs > > > > In my case if you see there is a ~3 minutes delay and this increases if I > increase the #of tables and #of nodes and DC. > > > > INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for > gossip to settle... > > *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip > backlog; proceeding *INFO [main] 2020-05-31 23:54:06,867 > NativeTransportService.java:70 - Netty using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty > Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > *INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening > for CQL clients on /x.x.x.x:9042 (encrypted)...* > > > > Also during this 3 minutes delay, I am losing all my metrics from the C* > nodes(basically the metrics are not returned within 10s). > > > > Can someone please help me understand the delay here? > > > > Cassandra Version: 3.11.3 > > Metrics: Using telegraf to collect metrics. >
Re: Cassandra Bootstrap Sequence
That gap seems a long time. Have you checked GC logs around the timeframe? From: Jai Bheemsen Rao Dhanwada Reply-To: "user@cassandra.apache.org" Date: Monday, June 1, 2020 at 3:52 PM To: "user@cassandra.apache.org" Subject: Cassandra Bootstrap Sequence Message from External Sender Hello Team, When I am bootstrapping/restarting a Cassandra Node, there is a delay between gossip settle and port opening. Can someone please explain me where this delay is configured and can this be changed? I don't see any information in the logs In my case if you see there is a ~3 minutes delay and this increases if I increase the #of tables and #of nodes and DC. INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to settle... INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; proceeding INFO [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty using native Epoll event loop INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, netty-codec=netty-codec-4.0.44.Final.452812a, netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, netty-codec-http=netty-codec-http-4.0.44.Final.452812a, netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, netty-common=netty-common-4.0.44.Final.452812a, netty-handler=netty-handler-4.0.44.Final.452812a, netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, netty-transport=netty-transport-4.0.44.Final.452812a, netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for CQL clients on /x.x.x.x:9042 (encrypted)... Also during this 3 minutes delay, I am losing all my metrics from the C* nodes(basically the metrics are not returned within 10s). Can someone please help me understand the delay here? Cassandra Version: 3.11.3 Metrics: Using telegraf to collect metrics.
Cassandra Bootstrap Sequence
Hello Team, When I am bootstrapping/restarting a Cassandra Node, there is a delay between gossip settle and port opening. Can someone please explain me where this delay is configured and can this be changed? I don't see any information in the logs In my case if you see there is a ~3 minutes delay and this increases if I increase the #of tables and #of nodes and DC. INFO [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for > gossip to settle... > > *INFO [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip > backlog; proceeding*INFO [main] 2020-05-31 23:54:06,867 > NativeTransportService.java:70 - Netty using native Epoll event loop > INFO [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty > Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > *INFO [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening > for CQL clients on /x.x.x.x:9042 (encrypted)...* Also during this 3 minutes delay, I am losing all my metrics from the C* nodes(basically the metrics are not returned within 10s). Can someone please help me understand the delay here? Cassandra Version: 3.11.3 Metrics: Using telegraf to collect metrics.