Re: Cassandra Bootstrap Sequence

2020-06-02 Thread Jai Bheemsen Rao Dhanwada
Just did some more debugging it looks like the "nodetool compactionstats"
which is hung/taking time during this period causing the delay in metrics.
I still puzzled why the nodetool compactionstats commands takes longer on
all the nodes at the same time, when one node is being restarted

$ time nodetool compactionstats
> pending tasks: 0
>
> real 1m17.559s
> user 0m2.340s
> sys 0m0.248s


On Tue, Jun 2, 2020 at 10:25 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Also during this time, I am losing metrics for all the nodes in the
> cluster (metrics agent is timing out collecting within 10s) and recovers
> once the node starts the CQL port. Is there any known issue which could
> cause this? In my case the delay between Gossip settle and CQL port open is
> 3 minutes, metrics were lost for all the nodes during the 3 minute period.
>
> On Tue, Jun 2, 2020 at 7:55 AM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Thank you,
>>
>> Does that mean there is no way to improve this delay? And i have to live
>> with it since i have more tables?
>>
>> On Tuesday, June 2, 2020, Durity, Sean R 
>> wrote:
>>
>>> As I understand it, Cassandra clusters should be limited to a number of
>>> tables in the low hundreds (under 200), at most. What you are seeing is the
>>> carving up of memtables for each of those 3,000. I try to limit my clusters
>>> to roughly 100 tables.
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Jai Bheemsen Rao Dhanwada 
>>> *Sent:* Tuesday, June 2, 2020 10:48 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>>>
>>>
>>>
>>> 3000 tables
>>>
>>> On Tuesday, June 2, 2020, Durity, Sean R 
>>> wrote:
>>>
>>> How many total tables in the cluster?
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Jai Bheemsen Rao Dhanwada 
>>> *Sent:* Monday, June 1, 2020 8:36 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>>>
>>>
>>>
>>> Thanks Erick,
>>>
>>>
>>>
>>> I see below tasks are being run mostly. I didn't quite understand what
>>> exactly these scheduled tasks are for? Is there a way to reduce the boot-up
>>> time or do I have to live with this delay?
>>>
>>>
>>>
>>> $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction
>>> strategy" debug.log*  | wc -l
>>> 3249
>>> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache
>>> for" debug.log*  | wc -l
>>> 6293
>>> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  |
>>> wc -l
>>> 6308
>>> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from
>>> DiskBoundaries" debug.log*  | wc -l
>>> 3249
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
>>> wrote:
>>>
>>> There's quite a lot of steps that takes place during the startup
>>> sequence between these 2 lines:
>>>
>>>
>>>
>>>
>>> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
>>> backlog; proceeding *INFO  [main] 2020-05-31 23:54:06,867
>>> NativeTransportService.java:70 - Netty using native Epoll event loop
>>>
>>>
>>>
>>> For the most part, it's taken up by CompactionStrategyManager and
>>> DiskBoundaryManager. If you check debug.log, you'll see that it's
>>> mostly updating disk boundaries. The length of time it takes is
>>> proportional to the number of tables in the cluster.
>>>
>>>
>>>
>>> Have a look at this section [1] of CassandraDaemon if you're interested
>>> in the details of the startup sequence. Cheers!
>>>
>>>
>>>
>>> [1] 
>>> https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
>>> [github.com]
>>> <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPH

Re: Cassandra Bootstrap Sequence

2020-06-02 Thread Jai Bheemsen Rao Dhanwada
Also during this time, I am losing metrics for all the nodes in the cluster
(metrics agent is timing out collecting within 10s) and recovers once the
node starts the CQL port. Is there any known issue which could cause this?
In my case the delay between Gossip settle and CQL port open is 3 minutes,
metrics were lost for all the nodes during the 3 minute period.

On Tue, Jun 2, 2020 at 7:55 AM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thank you,
>
> Does that mean there is no way to improve this delay? And i have to live
> with it since i have more tables?
>
> On Tuesday, June 2, 2020, Durity, Sean R 
> wrote:
>
>> As I understand it, Cassandra clusters should be limited to a number of
>> tables in the low hundreds (under 200), at most. What you are seeing is the
>> carving up of memtables for each of those 3,000. I try to limit my clusters
>> to roughly 100 tables.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Jai Bheemsen Rao Dhanwada 
>> *Sent:* Tuesday, June 2, 2020 10:48 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>>
>>
>>
>> 3000 tables
>>
>> On Tuesday, June 2, 2020, Durity, Sean R 
>> wrote:
>>
>> How many total tables in the cluster?
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Jai Bheemsen Rao Dhanwada 
>> *Sent:* Monday, June 1, 2020 8:36 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>>
>>
>>
>> Thanks Erick,
>>
>>
>>
>> I see below tasks are being run mostly. I didn't quite understand what
>> exactly these scheduled tasks are for? Is there a way to reduce the boot-up
>> time or do I have to live with this delay?
>>
>>
>>
>> $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction
>> strategy" debug.log*  | wc -l
>> 3249
>> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache
>> for" debug.log*  | wc -l
>> 6293
>> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc
>> -l
>> 6308
>> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from
>> DiskBoundaries" debug.log*  | wc -l
>> 3249
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
>> wrote:
>>
>> There's quite a lot of steps that takes place during the startup sequence
>> between these 2 lines:
>>
>>
>>
>>
>> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
>> backlog; proceeding *INFO  [main] 2020-05-31 23:54:06,867
>> NativeTransportService.java:70 - Netty using native Epoll event loop
>>
>>
>>
>> For the most part, it's taken up by CompactionStrategyManager and
>> DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
>> updating disk boundaries. The length of time it takes is proportional to
>> the number of tables in the cluster.
>>
>>
>>
>> Have a look at this section [1] of CassandraDaemon if you're interested
>> in the details of the startup sequence. Cheers!
>>
>>
>>
>> [1] 
>> https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
>> [github.com]
>> <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>
>>
>>
>> --
>>
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g.,

Re: Cassandra Bootstrap Sequence

2020-06-02 Thread Jai Bheemsen Rao Dhanwada
Thank you,

Does that mean there is no way to improve this delay? And i have to live
with it since i have more tables?

On Tuesday, June 2, 2020, Durity, Sean R 
wrote:

> As I understand it, Cassandra clusters should be limited to a number of
> tables in the low hundreds (under 200), at most. What you are seeing is the
> carving up of memtables for each of those 3,000. I try to limit my clusters
> to roughly 100 tables.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Jai Bheemsen Rao Dhanwada 
> *Sent:* Tuesday, June 2, 2020 10:48 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>
>
>
> 3000 tables
>
> On Tuesday, June 2, 2020, Durity, Sean R 
> wrote:
>
> How many total tables in the cluster?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Jai Bheemsen Rao Dhanwada 
> *Sent:* Monday, June 1, 2020 8:36 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>
>
>
> Thanks Erick,
>
>
>
> I see below tasks are being run mostly. I didn't quite understand what
> exactly these scheduled tasks are for? Is there a way to reduce the boot-up
> time or do I have to live with this delay?
>
>
>
> $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction
> strategy" debug.log*  | wc -l
> 3249
> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for"
> debug.log*  | wc -l
> 6293
> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc
> -l
> 6308
> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from
> DiskBoundaries" debug.log*  | wc -l
> 3249
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
> wrote:
>
> There's quite a lot of steps that takes place during the startup sequence
> between these 2 lines:
>
>
>
>
> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
> backlog; proceeding *INFO  [main] 2020-05-31 23:54:06,867
> NativeTransportService.java:70 - Netty using native Epoll event loop
>
>
>
> For the most part, it's taken up by CompactionStrategyManager and
> DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
> updating disk boundaries. The length of time it takes is proportional to
> the number of tables in the cluster.
>
>
>
> Have a look at this section [1] of CassandraDaemon if you're interested
> in the details of the startup sequence. Cheers!
>
>
>
> [1] https://github.com/apache/cassandra/blob/cassandra-3.11.
> 3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
> [github.com]
> <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>
>
>
> --
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: Cassandra Bootstrap Sequence

2020-06-02 Thread Durity, Sean R
As I understand it, Cassandra clusters should be limited to a number of tables 
in the low hundreds (under 200), at most. What you are seeing is the carving up 
of memtables for each of those 3,000. I try to limit my clusters to roughly 100 
tables.


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
Sent: Tuesday, June 2, 2020 10:48 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

3000 tables

On Tuesday, June 2, 2020, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
How many total tables in the cluster?


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>>
Sent: Monday, June 1, 2020 8:36 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what exactly 
these scheduled tasks are for? Is there a way to reduce the boot-up time or do 
I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" 
debug.log*  | wc -l
3249
$ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" 
debug.log*  | wc -l
6293
$ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc -l
6308
$ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" 
debug.log*  | wc -l
3249





On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
There's quite a lot of steps that takes place during the startup sequence 
between these 2 lines:

INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop

For the most part, it's taken up by CompactionStrategyManager and 
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly 
updating disk boundaries. The length of time it takes is proportional to the 
number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in the 
details of the startup sequence. Cheers!

[1] 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
 
[github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Cassandra Bootstrap Sequence

2020-06-02 Thread Reid Pinchback
Would updating disk boundaries be sensitive to disk I/O tuning?  I’m 
remembering Jon Haddad’s talk about typical throughput problems in disk page 
sizing.

From: Jai Bheemsen Rao Dhanwada 
Reply-To: "user@cassandra.apache.org" 
Date: Tuesday, June 2, 2020 at 10:48 AM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra Bootstrap Sequence

Message from External Sender
3000 tables

On Tuesday, June 2, 2020, Durity, Sean R 
mailto:sean_r_dur...@homedepot.com>> wrote:
How many total tables in the cluster?


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>>
Sent: Monday, June 1, 2020 8:36 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what exactly 
these scheduled tasks are for? Is there a way to reduce the boot-up time or do 
I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" 
debug.log*  | wc -l
3249
$ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" 
debug.log*  | wc -l
6293
$ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc -l
6308
$ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" 
debug.log*  | wc -l
3249





On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
There's quite a lot of steps that takes place during the startup sequence 
between these 2 lines:

INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop

For the most part, it's taken up by CompactionStrategyManager and 
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly 
updating disk boundaries. The length of time it takes is proportional to the 
number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in the 
details of the startup sequence. Cheers!

[1] 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
 
[github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Cassandra Bootstrap Sequence

2020-06-02 Thread Jai Bheemsen Rao Dhanwada
3000 tables

On Tuesday, June 2, 2020, Durity, Sean R 
wrote:

> How many total tables in the cluster?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Jai Bheemsen Rao Dhanwada 
> *Sent:* Monday, June 1, 2020 8:36 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Cassandra Bootstrap Sequence
>
>
>
> Thanks Erick,
>
>
>
> I see below tasks are being run mostly. I didn't quite understand what
> exactly these scheduled tasks are for? Is there a way to reduce the boot-up
> time or do I have to live with this delay?
>
>
>
> $ zgrep "CompactionStrategyManager.java:380 - Recreating compaction
> strategy" debug.log*  | wc -l
> 3249
> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for"
> debug.log*  | wc -l
> 6293
> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc
> -l
> 6308
> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from
> DiskBoundaries" debug.log*  | wc -l
> 3249
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
> wrote:
>
> There's quite a lot of steps that takes place during the startup sequence
> between these 2 lines:
>
>
>
>
> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
> backlog; proceeding *INFO  [main] 2020-05-31 23:54:06,867
> NativeTransportService.java:70 - Netty using native Epoll event loop
>
>
>
> For the most part, it's taken up by CompactionStrategyManager and
> DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
> updating disk boundaries. The length of time it takes is proportional to
> the number of tables in the cluster.
>
>
>
> Have a look at this section [1] of CassandraDaemon if you're interested
> in the details of the startup sequence. Cheers!
>
>
>
> [1] https://github.com/apache/cassandra/blob/cassandra-3.11.
> 3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
> [github.com]
> <https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: Cassandra Bootstrap Sequence

2020-06-02 Thread Durity, Sean R
How many total tables in the cluster?


Sean Durity

From: Jai Bheemsen Rao Dhanwada 
Sent: Monday, June 1, 2020 8:36 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Bootstrap Sequence

Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what exactly 
these scheduled tasks are for? Is there a way to reduce the boot-up time or do 
I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction strategy" 
debug.log*  | wc -l
3249
$ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for" 
debug.log*  | wc -l
6293
$ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc -l
6308
$ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from DiskBoundaries" 
debug.log*  | wc -l
3249





On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
There's quite a lot of steps that takes place during the startup sequence 
between these 2 lines:

INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop

For the most part, it's taken up by CompactionStrategyManager and 
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly 
updating disk boundaries. The length of time it takes is proportional to the 
number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in the 
details of the startup sequence. Cheers!

[1] 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
 
[github.com]<https://urldefense.com/v3/__https:/github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java*L399-L435__;Iw!!M-nmYVHPHQ!dt_R3xGLIK4vc3FdekacgZnl6PDJVAqW_c-yBaIAmQsoVKp7SoW7VeM3gc7VSLx2KgcKBSE$>



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Jai Bheemsen Rao Dhanwada
Thanks Erick,

I see below tasks are being run mostly. I didn't quite understand what
exactly these scheduled tasks are for? Is there a way to reduce the boot-up
time or do I have to live with this delay?

$ zgrep "CompactionStrategyManager.java:380 - Recreating compaction
> strategy" debug.log*  | wc -l
> 3249
> $ zgrep "DiskBoundaryManager.java:53 - Refreshing disk boundary cache for"
> debug.log*  | wc -l
> 6293
> $ zgrep "DiskBoundaryManager.java:92 - Got local ranges" debug.log*  | wc
> -l
> 6308
> $ zgrep "DiskBoundaryManager.java:56 - Updating boundaries from
> DiskBoundaries" debug.log*  | wc -l
> 3249






On Mon, Jun 1, 2020 at 5:01 PM Erick Ramirez 
wrote:

> There's quite a lot of steps that takes place during the startup sequence
> between these 2 lines:
>
>
>>> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
>>> backlog; proceeding*INFO  [main] 2020-05-31 23:54:06,867
>>> NativeTransportService.java:70 - Netty using native Epoll event loop
>>>
>>
> For the most part, it's taken up by CompactionStrategyManager and
> DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
> updating disk boundaries. The length of time it takes is proportional to
> the number of tables in the cluster.
>
> Have a look at this section [1] of CassandraDaemon if you're interested
> in the details of the startup sequence. Cheers!
>
> [1]
> https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435
>


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Erick Ramirez
There's quite a lot of steps that takes place during the startup sequence
between these 2 lines:


>> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
>> backlog; proceeding*INFO  [main] 2020-05-31 23:54:06,867
>> NativeTransportService.java:70 - Netty using native Epoll event loop
>>
>
For the most part, it's taken up by CompactionStrategyManager and
DiskBoundaryManager. If you check debug.log, you'll see that it's mostly
updating disk boundaries. The length of time it takes is proportional to
the number of tables in the cluster.

Have a look at this section [1] of CassandraDaemon if you're interested in
the details of the startup sequence. Cheers!

[1]
https://github.com/apache/cassandra/blob/cassandra-3.11.3/src/java/org/apache/cassandra/service/CassandraDaemon.java#L399-L435


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Reid Pinchback
The thing to look for in GC logs would be signs that you’re bouncing against 
your memory limits and spending a lot of time in full GC collections.

I’m not sure at what phase it kicks in but definitely there is the potential 
for memory issues when you have large column families (large in the number of 
columns I mean), and you’re mentioning that the situation gets worse in 
proportion to the number of tables brought GC to mind.  Not sure about 
proportion of nodes, I think there are thread counts that increase with the 
number of nodes, and increased threads also can add to GC load, particularly in 
G1GC.

I’m speculating a bit on possible causes, but basically the idea was to look 
for GC load during those 3 minutes, because if you see it then you’re not 
hunting for a timeout tuning or anything like that, you’re hunting for a 
resource allocation tuning.

From: Jai Bheemsen Rao Dhanwada 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, June 1, 2020 at 7:15 PM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra Bootstrap Sequence

Message from External Sender
Is there anything specific to for in GC logs?
b/w this delay happens always whenever I bootstrap the node or restart a C* 
process.

I don't believe it's a GC issue and correction from initial question, it's not 
just bootstrap, but every restart of C* process is causing this.

On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback 
mailto:rpinchb...@tripadvisor.com>> wrote:
That gap seems a long time.  Have you checked GC logs around the timeframe?

From: Jai Bheemsen Rao Dhanwada 
mailto:jaibheem...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Monday, June 1, 2020 at 3:52 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra Bootstrap Sequence

Message from External Sender
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
gossip settle and port opening. Can someone please explain me where this delay 
is configured and can this be changed? I don't see any information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I 
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to 
settle...
INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop
INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
[netty-buffer=netty-buffer-4.0.44.Final.452812a, 
netty-codec=netty-codec-4.0.44.Final.452812a, 
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
netty-common=netty-common-4.0.44.Final.452812a, 
netty-handler=netty-handler-4.0.44.Final.452812a, 
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
netty-transport=netty-transport-4.0.44.Final.452812a, 
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, 
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
CQL clients on /x.x.x.x:9042 (encrypted)...

Also during this 3 minutes delay, I am losing all my metrics from the C* 
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Jai Bheemsen Rao Dhanwada
Is there anything specific to for in GC logs?
b/w this delay happens always whenever I bootstrap the node or restart a C*
process.

I don't believe it's a GC issue and correction from initial question, it's
not just bootstrap, but every restart of C* process is causing this.

On Mon, Jun 1, 2020 at 3:22 PM Reid Pinchback 
wrote:

> That gap seems a long time.  Have you checked GC logs around the timeframe?
>
>
>
> *From: *Jai Bheemsen Rao Dhanwada 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, June 1, 2020 at 3:52 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Cassandra Bootstrap Sequence
>
>
>
> *Message from External Sender*
>
> Hello Team,
>
>
>
> When I am bootstrapping/restarting a Cassandra Node, there is a delay
> between gossip settle and port opening. Can someone please explain me where
> this delay is configured and can this be changed? I don't see any
> information in the logs
>
>
>
> In my case if you see there is  a ~3 minutes delay and this increases if I
> increase the #of tables and #of nodes and DC.
>
>
>
> INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for
> gossip to settle...
>
> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
> backlog; proceeding *INFO  [main] 2020-05-31 23:54:06,867
> NativeTransportService.java:70 - Netty using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> *INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening
> for CQL clients on /x.x.x.x:9042 (encrypted)...*
>
>
>
> Also during this 3 minutes delay, I am losing all my metrics from the C*
> nodes(basically the metrics are not returned within 10s).
>
>
>
> Can someone please help me understand the delay here?
>
>
>
> Cassandra Version: 3.11.3
>
> Metrics: Using telegraf to collect metrics.
>


Re: Cassandra Bootstrap Sequence

2020-06-01 Thread Reid Pinchback
That gap seems a long time.  Have you checked GC logs around the timeframe?

From: Jai Bheemsen Rao Dhanwada 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, June 1, 2020 at 3:52 PM
To: "user@cassandra.apache.org" 
Subject: Cassandra Bootstrap Sequence

Message from External Sender
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay between 
gossip settle and port opening. Can someone please explain me where this delay 
is configured and can this be changed? I don't see any information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I 
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for gossip to 
settle...
INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip backlog; 
proceeding
INFO  [main] 2020-05-31 23:54:06,867 NativeTransportService.java:70 - Netty 
using native Epoll event loop
INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty Version: 
[netty-buffer=netty-buffer-4.0.44.Final.452812a, 
netty-codec=netty-codec-4.0.44.Final.452812a, 
netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
netty-common=netty-common-4.0.44.Final.452812a, 
netty-handler=netty-handler-4.0.44.Final.452812a, 
netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
netty-transport=netty-transport-4.0.44.Final.452812a, 
netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, 
netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening for 
CQL clients on /x.x.x.x:9042 (encrypted)...

Also during this 3 minutes delay, I am losing all my metrics from the C* 
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.


Cassandra Bootstrap Sequence

2020-06-01 Thread Jai Bheemsen Rao Dhanwada
Hello Team,

When I am bootstrapping/restarting a Cassandra Node, there is a delay
between gossip settle and port opening. Can someone please explain me where
this delay is configured and can this be changed? I don't see any
information in the logs

In my case if you see there is  a ~3 minutes delay and this increases if I
increase the #of tables and #of nodes and DC.

INFO  [main] 2020-05-31 23:51:07,554 Gossiper.java:1692 - Waiting for
> gossip to settle...
>
> *INFO  [main] 2020-05-31 23:51:15,555 Gossiper.java:1723 - No gossip
> backlog; proceeding*INFO  [main] 2020-05-31 23:54:06,867
> NativeTransportService.java:70 - Netty using native Epoll event loop
> INFO  [main] 2020-05-31 23:54:06,913 Server.java:155 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> *INFO  [main] 2020-05-31 23:54:06,913 Server.java:156 - Starting listening
> for CQL clients on /x.x.x.x:9042 (encrypted)...*


Also during this 3 minutes delay, I am losing all my metrics from the C*
nodes(basically the metrics are not returned within 10s).

Can someone please help me understand the delay here?

Cassandra Version: 3.11.3
Metrics: Using telegraf to collect metrics.