Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-15 Thread Laxmikant Upadhyay
Hi Habash,

The reason of  "Cannot replace_address /10.xx.xx.xxx.xx because it doesn't
exist in gossip  " error during replace is that the dead node gossip
information could not survive when you did full cluster (rest of the nodes)
restart.

I faced this issue before, you can check my experience of resolving this
issue in below link
https://github.com/laxmikant99/cassandra-single-node-disater-recovery-lessons

regards,
Laxmikant



On Fri, Mar 15, 2019 at 5:21 AM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> It is just a C* in Docker Compose with static IP addresses as long as all
> containers run. I am just killing Cassandra process and starting it again
> in each container.
>
> On Fri, 15 Mar 2019 at 10:47, Jeff Jirsa  wrote:
>
>> Are your IPs changing as you restart the cluster? Kubernetes or Mesos or
>> something where your data gets scheduled on different machines? If so, if
>> it gets an IP that was previously in the cluster, it’ll stomp on the old
>> entry in the gossiper maps
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Mar 14, 2019, at 3:42 PM, Fd Habash  wrote:
>>
>> I can conclusively say, none of these commands were run. However, I think
>> this is  the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>>- If 3 shows as DN
>>- Restart C* on 1 & 2
>>- Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> 
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa 
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra 
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash  wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>> at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN  10.xx.xx.xx  388.43 KB  256  6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> 
>> Thank you
>>
>>
>>
>>
>>
>>
>
> --
>
>
> *Stefan Miklosovic**Senior Software Engineer*
>
>
> M: +61459911436
>
> 
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
> Instaclustr values your privacy. Our privacy policy can be found at
> https://www.instaclustr.com/company/policies/privacy-policy
>


-- 

regards,
Laxmikant Upadhyay


cqlsh: COPY FROM: datetimeformat

2019-03-15 Thread Devopam Mittra
Env:

[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]

I am trying to ingest a csv that has date in MM/DD/ format ( %m/%d/%Y
).
While trying to load I am providing the WITH datetimeformat = '%m/%d/%Y'
but still getting errored out *time data '03/12/2019' does not match format
'%Y-%m-%d',  given up without retries*

Surprisingly , when I try to export data from the same table, and using a
custom datetimeformat, I am getting the correct output .

Any pointers on what to do? or what I am doing wrong?

regards
Dev


Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-15 Thread Jeff Jirsa
Is this using GPFS?  If so, can you open a JIRA? It feels like potentially
GPFS is not persisting the rack/DC info into system.peers and loses the DC
on restart. This is somewhat understandable, but definitely deserves a
JIRA.

On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi Fd,
>
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3
> reported node2 to be DN, then I killed node1 and node3 and I restarted them
> and node2 was reported like this:
>
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens   Owns (effective)  Host ID
>  Rack
> DN  172.19.0.8  ?  256  64.0%
>  bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
> Datacenter: dc1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens   Owns (effective)  Host ID
>  Rack
> UN  172.19.0.5  382.75 KiB  256  64.4%
>  2a062140-2428-4092-b48b-7495d083d7f9  rack1
> UN  172.19.0.9  171.41 KiB  256  71.6%
>  9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3
>
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it
> was part of the "Datacenter: dc1" output where both node1 and node3 were.
>
> But after killing both node1 and node3 (so cluster was totally down),
> after restarting them, node2 was reported like that.
>
> I do not know what is the difference here. Are gossiping data somewhere
> stored on the disk? I would say so, otherwise there is no way how could
> node1 / node3 report
> that node2 is down but at the same time I dont get why it is "out of the
> list" where node1 and node3 are.
>
>
> On Fri, 15 Mar 2019 at 02:42, Fd Habash  wrote:
>
>> I can conclusively say, none of these commands were run. However, I think
>> this is  the likely scenario …
>>
>>
>>
>> If you have a cluster of three nodes 1,2,3 …
>>
>>- If 3 shows as DN
>>- Restart C* on 1 & 2
>>- Nodetool status should NOT show node 3 IP at all.
>>
>>
>>
>> Restarting the cluster while a node is down resets gossip state.
>>
>>
>>
>> There is a good chance this is what happened.
>>
>>
>>
>> Plausible?
>>
>>
>>
>> 
>> Thank you
>>
>>
>>
>> *From: *Jeff Jirsa 
>> *Sent: *Thursday, March 14, 2019 11:06 AM
>> *To: *cassandra 
>> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
>> exist ingossip
>>
>>
>>
>> Two things that wouldn't be a bug:
>>
>>
>>
>> You could have run removenode
>>
>> You could have run assassinate
>>
>>
>>
>> Also could be some new bug, but that's much less likely.
>>
>>
>>
>>
>>
>> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash  wrote:
>>
>> I have a node which I know for certain was a cluster member last week. It
>> showed in nodetool status as DN. When I attempted to replace it today, I
>> got this message
>>
>>
>>
>> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
>> encountered during startup
>>
>> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
>> because it doesn't exist in gossip
>>
>> at
>> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>> ~[apache-cassandra-2.2.8.jar:2.2.8]
>>
>>
>>
>>
>>
>> DN  10.xx.xx.xx  388.43 KB  256  6.9%
>> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>>
>>
>>
>> Under what conditions does this happen?
>>
>>
>>
>>
>>
>> 
>> Thank you
>>
>>
>>
>>
>>
>
> Stefan Miklosovic
>
>


Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-15 Thread Jeff Jirsa
On Thu, Mar 14, 2019 at 3:42 PM Fd Habash  wrote:

> I can conclusively say, none of these commands were run. However, I think
> this is  the likely scenario …
>
>
>
> If you have a cluster of three nodes 1,2,3 …
>
>- If 3 shows as DN
>- Restart C* on 1 & 2
>- Nodetool status should NOT show node 3 IP at all.
>
>
If you do this, node3 definitely needs to still be present, and it should
still show DN. If it doesnt, ranges move, and consistency will be violated
(aka: really bad).


>
>
> Restarting the cluster while a node is down resets gossip state.
>

It resets some internal states, but not all of them. It may lose hosts that
have left, but it shouldn't lose any that are simply down.


>
>
> There is a good chance this is what happened.
>
>
>
> Plausible?
>
>
>
> 
> Thank you
>
>
>
> *From: *Jeff Jirsa 
> *Sent: *Thursday, March 14, 2019 11:06 AM
> *To: *cassandra 
> *Subject: *Re: Cannot replace_address /10.xx.xx.xx because it doesn't
> exist ingossip
>
>
>
> Two things that wouldn't be a bug:
>
>
>
> You could have run removenode
>
> You could have run assassinate
>
>
>
> Also could be some new bug, but that's much less likely.
>
>
>
>
>
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash  wrote:
>
> I have a node which I know for certain was a cluster member last week. It
> showed in nodetool status as DN. When I attempted to replace it today, I
> got this message
>
>
>
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception
> encountered during startup
>
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx
> because it doesn't exist in gossip
>
> at
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
> ~[apache-cassandra-2.2.8.jar:2.2.8]
>
>
>
>
>
> DN  10.xx.xx.xx  388.43 KB  256  6.9%
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
>
>
>
> Under what conditions does this happen?
>
>
>
>
>
> 
> Thank you
>
>
>
>
>


Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist ingossip

2019-03-15 Thread Sam Tunnicliffe
Do you have a cassandra-topology.properties file in place? If so, GPFS will 
instantiate a PropertyFileSnitch using that for compatibility mode. Then, when 
gossip state doesn’t contain any endpoint info about the down node (because you 
bounced the whole cluster), instead of reading the rack & dc from system.peers, 
it will fall back to the PFS. DC1:r1 is the default in the 
cassandra-topologies.properties in the distro.

> On 15 Mar 2019, at 12:04, Jeff Jirsa  wrote:
> 
> Is this using GPFS?  If so, can you open a JIRA? It feels like potentially 
> GPFS is not persisting the rack/DC info into system.peers and loses the DC on 
> restart. This is somewhat understandable, but definitely deserves a JIRA. 
> 
> On Thu, Mar 14, 2019 at 11:44 PM Stefan Miklosovic 
>  > wrote:
> Hi Fd,
> 
> I tried this on 3 nodes cluster. I killed node 2, both node1 and node3 
> reported node2 to be DN, then I killed node1 and node3 and I restarted them 
> and node2 was reported like this:
> 
> [root@spark-master-1 /]# nodetool status
> Datacenter: DC1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens   Owns (effective)  Host ID 
>   Rack
> DN  172.19.0.8  ?  256  64.0% 
> bd75a5e2-2890-44c5-8f7a-fca1b4ce94ab  r1
> Datacenter: dc1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens   Owns (effective)  Host ID 
>   Rack
> UN  172.19.0.5  382.75 KiB  256  64.4% 
> 2a062140-2428-4092-b48b-7495d083d7f9  rack1
> UN  172.19.0.9  171.41 KiB  256  71.6% 
> 9590b791-ad53-4b5a-b4c7-b00408ed02dd  rack3
> 
> Prior to killing of node1 and node3, node2 was indeed marked as DN but it was 
> part of the "Datacenter: dc1" output where both node1 and node3 were.
> 
> But after killing both node1 and node3 (so cluster was totally down), after 
> restarting them, node2 was reported like that.
> 
> I do not know what is the difference here. Are gossiping data somewhere 
> stored on the disk? I would say so, otherwise there is no way how could node1 
> / node3 report 
> that node2 is down but at the same time I dont get why it is "out of the 
> list" where node1 and node3 are.
> 
> 
> On Fri, 15 Mar 2019 at 02:42, Fd Habash  > wrote:
> I can conclusively say, none of these commands were run. However, I think 
> this is  the likely scenario …
> 
>  
> 
> If you have a cluster of three nodes 1,2,3 …
> 
> If 3 shows as DN
> Restart C* on 1 & 2
> Nodetool status should NOT show node 3 IP at all.
>  
> 
> Restarting the cluster while a node is down resets gossip state.
> 
>  
> 
> There is a good chance this is what happened.
> 
>  
> 
> Plausible?
> 
>  
> 
> 
> Thank you
> 
>  
> 
> From: Jeff Jirsa 
> Sent: Thursday, March 14, 2019 11:06 AM
> To: cassandra 
> Subject: Re: Cannot replace_address /10.xx.xx.xx because it doesn't exist 
> ingossip
> 
>  
> 
> Two things that wouldn't be a bug:
> 
>  
> 
> You could have run removenode
> 
> You could have run assassinate
> 
>  
> 
> Also could be some new bug, but that's much less likely. 
> 
>  
> 
>  
> 
> On Thu, Mar 14, 2019 at 2:50 PM Fd Habash  > wrote:
> 
> I have a node which I know for certain was a cluster member last week. It 
> showed in nodetool status as DN. When I attempted to replace it today, I got 
> this message
> 
>  
> ERROR [main] 2019-03-14 14:40:49,208 CassandraDaemon.java:654 - Exception 
> encountered during startup
> 
> java.lang.RuntimeException: Cannot replace_address /10.xx.xx.xxx.xx because 
> it doesn't exist in gossip
> 
> at 
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:449)
>  ~[apache-cassandra-2.2.8.jar:2.2.8]
> 
>  
>  
> DN  10.xx.xx.xx  388.43 KB  256  6.9%  
> bdbd632a-bf5d-44d4-b220-f17f258c4701  1e
> 
>  
> Under what conditions does this happen?
> 
>  
>  
> 
> Thank you
> 
>  
>  
> 
> 
> Stefan Miklosovic
> 



TWCS and tombstone purging

2019-03-15 Thread Nick Hatfield
Hey guys,

Can someone give me some idea or link some good material for determining a good 
/ aggressive tombstone strategy? I want to make sure my tombstones are getting 
purged as soon as possible to reclaim disk.

Thanks


read request is slow

2019-03-15 Thread Sundaramoorthy, Natarajan
3 pod deployed in openshift. Read request timed out due to GC collection. Can 
you please look at below parameters and value to see if anything is out of 
place? Thanks


cat cassandra.yaml

num_tokens: 256



hinted_handoff_enabled: true

hinted_handoff_throttle_in_kb: 1024

max_hints_delivery_threads: 2

hints_directory: /cassandra_data/hints

hints_flush_period_in_ms: 1

max_hints_file_size_in_mb: 128


batchlog_replay_throttle_in_kb: 1024

authenticator: PasswordAuthenticator

authorizer: AllowAllAuthorizer

role_manager: CassandraRoleManager

roles_validity_in_ms: 2000


permissions_validity_in_ms: 2000




partitioner: org.apache.cassandra.dht.Murmur3Partitioner

data_file_directories:
- /cassandra_data/data

commitlog_directory: /cassandra_data/commitlog

disk_failure_policy: stop

commit_failure_policy: stop

key_cache_size_in_mb:

key_cache_save_period: 14400



row_cache_size_in_mb: 0

row_cache_save_period: 0


counter_cache_size_in_mb:

counter_cache_save_period: 7200


saved_caches_directory: /cassandra_data/saved_caches

commitlog_sync: periodic
commitlog_sync_period_in_ms: 1

commitlog_segment_size_in_mb: 32


seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: 
"cassandra-0.cassandra.ihr-ei.svc.cluster.local,cassandra-1.cassandra.ihr-ei.svc.cluster.local"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

concurrent_materialized_view_writes: 32




disk_optimization_strategy: ssd



memtable_allocation_type: heap_buffers

commitlog_total_space_in_mb: 2048


index_summary_capacity_in_mb:

index_summary_resize_interval_in_minutes: 60

trickle_fsync: false
trickle_fsync_interval_in_kb: 10240

storage_port: 7000

ssl_storage_port: 7001

listen_address: 10.130.7.245

broadcast_address: 10.130.7.245



start_native_transport: true
native_transport_port: 9042



start_rpc: true

rpc_address: 0.0.0.0

rpc_port: 9160

broadcast_rpc_address: 10.130.7.245

rpc_keepalive: true

rpc_server_type: sync




thrift_framed_transport_size_in_mb: 15

incremental_backups: false

snapshot_before_compaction: false

auto_snapshot: true

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10

column_index_size_in_kb: 64


batch_size_warn_threshold_in_kb: 5

batch_size_fail_threshold_in_kb: 50


compaction_throughput_mb_per_sec: 16

compaction_large_partition_warning_threshold_mb: 100

sstable_preemptive_open_interval_in_mb: 50



read_request_timeout_in_ms: 5
range_request_timeout_in_ms: 10
write_request_timeout_in_ms: 2
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 6
request_timeout_in_ms: 10

cross_node_timeout: false


phi_convict_threshold: 12

endpoint_snitch: GossipingPropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler



server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
truststore: conf/.truststore

client_encryption_options:
enabled: false
optional: false
keystore: conf/.keystore

internode_compression: all

inter_dc_tcp_nodelay: false

tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800

gc_warn_threshold_in_ms: 1000

enable_user_defined_functions: false

enable_scripted_user_defined_functions: false

windows_timer_interval: 1


auto_bootstrap: false

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


Re: read request is slow

2019-03-15 Thread Jon Haddad
1. What was the read request?  Are you fetching a single row, a million,
something else?
2. What are your GC settings?
3. What's the hardware in use?  What resources have been allocated to each
instance?
4. Did you see this issue after a single request or is the cluster under
heavy load?

If you're going to share a config it's much easier to read as an actual
text file rather than a double spaced paste into the ML.  In the future if
you could share a link to the yaml you might get more eyes on it.

Jon

On Sat, Mar 16, 2019 at 3:57 PM Sundaramoorthy, Natarajan <
natarajan_sundaramoor...@optum.com> wrote:

> 3 pod deployed in openshift. Read request timed out due to GC collection.
> Can you please look at below parameters and value to see if anything is out
> of place? Thanks
>
>
>
>
>
> cat cassandra.yaml
>
>
>
> num_tokens: 256
>
>
>
>
>
>
>
> hinted_handoff_enabled: true
>
>
>
> hinted_handoff_throttle_in_kb: 1024
>
>
>
> max_hints_delivery_threads: 2
>
>
>
> hints_directory: /cassandra_data/hints
>
>
>
> hints_flush_period_in_ms: 1
>
>
>
> max_hints_file_size_in_mb: 128
>
>
>
>
>
> batchlog_replay_throttle_in_kb: 1024
>
>
>
> authenticator: PasswordAuthenticator
>
>
>
> authorizer: AllowAllAuthorizer
>
>
>
> role_manager: CassandraRoleManager
>
>
>
> roles_validity_in_ms: 2000
>
>
>
>
>
> permissions_validity_in_ms: 2000
>
>
>
>
>
>
>
>
>
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>
>
> data_file_directories:
>
> - /cassandra_data/data
>
>
>
> commitlog_directory: /cassandra_data/commitlog
>
>
>
> disk_failure_policy: stop
>
>
>
> commit_failure_policy: stop
>
>
>
> key_cache_size_in_mb:
>
>
>
> key_cache_save_period: 14400
>
>
>
>
>
>
>
> row_cache_size_in_mb: 0
>
>
>
> row_cache_save_period: 0
>
>
>
>
>
> counter_cache_size_in_mb:
>
>
>
> counter_cache_save_period: 7200
>
>
>
>
>
> saved_caches_directory: /cassandra_data/saved_caches
>
>
>
> commitlog_sync: periodic
>
> commitlog_sync_period_in_ms: 1
>
>
>
> commitlog_segment_size_in_mb: 32
>
>
>
>
>
> seed_provider:
>
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>
>   parameters:
>
>   - seeds:
> "cassandra-0.cassandra.ihr-ei.svc.cluster.local,cassandra-1.cassandra.ihr-ei.svc.cluster.local"
>
>
>
> concurrent_reads: 32
>
> concurrent_writes: 32
>
> concurrent_counter_writes: 32
>
>
>
> concurrent_materialized_view_writes: 32
>
>
>
>
>
>
>
>
>
> disk_optimization_strategy: ssd
>
>
>
>
>
>
>
> memtable_allocation_type: heap_buffers
>
>
>
> commitlog_total_space_in_mb: 2048
>
>
>
>
>
> index_summary_capacity_in_mb:
>
>
>
> index_summary_resize_interval_in_minutes: 60
>
>
>
> trickle_fsync: false
>
> trickle_fsync_interval_in_kb: 10240
>
>
>
> storage_port: 7000
>
>
>
> ssl_storage_port: 7001
>
>
>
> listen_address: 10.130.7.245
>
>
>
> broadcast_address: 10.130.7.245
>
>
>
>
>
>
>
> start_native_transport: true
>
> native_transport_port: 9042
>
>
>
>
>
>
>
> start_rpc: true
>
>
>
> rpc_address: 0.0.0.0
>
>
>
> rpc_port: 9160
>
>
>
> broadcast_rpc_address: 10.130.7.245
>
>
>
> rpc_keepalive: true
>
>
>
> rpc_server_type: sync
>
>
>
>
>
>
>
>
>
> thrift_framed_transport_size_in_mb: 15
>
>
>
> incremental_backups: false
>
>
>
> snapshot_before_compaction: false
>
>
>
> auto_snapshot: true
>
>
>
> tombstone_warn_threshold: 1000
>
> tombstone_failure_threshold: 10
>
>
>
> column_index_size_in_kb: 64
>
>
>
>
>
> batch_size_warn_threshold_in_kb: 5
>
>
>
> batch_size_fail_threshold_in_kb: 50
>
>
>
>
>
> compaction_throughput_mb_per_sec: 16
>
>
>
> compaction_large_partition_warning_threshold_mb: 100
>
>
>
> sstable_preemptive_open_interval_in_mb: 50
>
>
>
>
>
>
>
> read_request_timeout_in_ms: 5
>
> range_request_timeout_in_ms: 10
>
> write_request_timeout_in_ms: 2
>
> counter_write_request_timeout_in_ms: 5000
>
> cas_contention_timeout_in_ms: 1000
>
> truncate_request_timeout_in_ms: 6
>
> request_timeout_in_ms: 10
>
>
>
> cross_node_timeout: false
>
>
>
>
>
> phi_convict_threshold: 12
>
>
>
> endpoint_snitch: GossipingPropertyFileSnitch
>
>
>
> dynamic_snitch_update_interval_in_ms: 100
>
> dynamic_snitch_reset_interval_in_ms: 60
>
> dynamic_snitch_badness_threshold: 0.1
>
>
>
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>
>
>
>
>
>
>
> server_encryption_options:
>
> internode_encryption: none
>
> keystore: conf/.keystore
>
> truststore: conf/.truststore
>
>
>
> client_encryption_options:
>
> enabled: false
>
> optional: false
>
> keystore: conf/.keystore
>
>
>
> internode_compression: all
>
>
>
> inter_dc_tcp_nodelay: false
>
>
>
> tracetype_query_ttl: 86400
>
> tracetype_repair_ttl: 604800
>
>
>
> gc_warn_threshold_in_ms: 1000
>
>
>
> enable_user_defined_functions: false
>
>
>
> enable_scripted_user_defined_functions: false
>
>
>
> windows_timer_interval: 1
>
>
>
>
>
> auto_bootstrap: false
>
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or

Re: read request is slow

2019-03-15 Thread Dieudonné Madishon NGAYA
Hi,
I want to add something:
5. do you know on which table are you getting these reads timeout ?
6. if yes, can you see if you don't have  Excessive tombstone activity
7. how often do you run repair ?
8. can you send a system.log and also report of nodetool tpstats
9.  Swap is enabled   or not ?

Best regards
_

[image:
https://www.facebook.com/DMN-BigData-371074727032197/?modal=admin_todo_tour]

      


*Dieudonne Madishon NGAYA*
Datastax, Cassandra Architect
*P: *7048580065
*w: *www.dmnbigdata.com
*E: *dmng...@dmnbigdata.com
*Private E: *dmng...@gmail.com
*A: *Charlotte,NC,28273, USA


On Fri, Mar 15, 2019 at 11:32 PM Jon Haddad  wrote:

>
> 1. What was the read request?  Are you fetching a single row, a million,
> something else?
> 2. What are your GC settings?
> 3. What's the hardware in use?  What resources have been allocated to each
> instance?
> 4. Did you see this issue after a single request or is the cluster under
> heavy load?
>
> If you're going to share a config it's much easier to read as an actual
> text file rather than a double spaced paste into the ML.  In the future if
> you could share a link to the yaml you might get more eyes on it.
>
> Jon
>
> On Sat, Mar 16, 2019 at 3:57 PM Sundaramoorthy, Natarajan <
> natarajan_sundaramoor...@optum.com> wrote:
>
>> 3 pod deployed in openshift. Read request timed out due to GC collection.
>> Can you please look at below parameters and value to see if anything is out
>> of place? Thanks
>>
>>
>>
>>
>>
>> cat cassandra.yaml
>>
>>
>>
>> num_tokens: 256
>>
>>
>>
>>
>>
>>
>>
>> hinted_handoff_enabled: true
>>
>>
>>
>> hinted_handoff_throttle_in_kb: 1024
>>
>>
>>
>> max_hints_delivery_threads: 2
>>
>>
>>
>> hints_directory: /cassandra_data/hints
>>
>>
>>
>> hints_flush_period_in_ms: 1
>>
>>
>>
>> max_hints_file_size_in_mb: 128
>>
>>
>>
>>
>>
>> batchlog_replay_throttle_in_kb: 1024
>>
>>
>>
>> authenticator: PasswordAuthenticator
>>
>>
>>
>> authorizer: AllowAllAuthorizer
>>
>>
>>
>> role_manager: CassandraRoleManager
>>
>>
>>
>> roles_validity_in_ms: 2000
>>
>>
>>
>>
>>
>> permissions_validity_in_ms: 2000
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>
>>
>>
>> data_file_directories:
>>
>> - /cassandra_data/data
>>
>>
>>
>> commitlog_directory: /cassandra_data/commitlog
>>
>>
>>
>> disk_failure_policy: stop
>>
>>
>>
>> commit_failure_policy: stop
>>
>>
>>
>> key_cache_size_in_mb:
>>
>>
>>
>> key_cache_save_period: 14400
>>
>>
>>
>>
>>
>>
>>
>> row_cache_size_in_mb: 0
>>
>>
>>
>> row_cache_save_period: 0
>>
>>
>>
>>
>>
>> counter_cache_size_in_mb:
>>
>>
>>
>> counter_cache_save_period: 7200
>>
>>
>>
>>
>>
>> saved_caches_directory: /cassandra_data/saved_caches
>>
>>
>>
>> commitlog_sync: periodic
>>
>> commitlog_sync_period_in_ms: 1
>>
>>
>>
>> commitlog_segment_size_in_mb: 32
>>
>>
>>
>>
>>
>> seed_provider:
>>
>> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>>
>>   parameters:
>>
>>   - seeds:
>> "cassandra-0.cassandra.ihr-ei.svc.cluster.local,cassandra-1.cassandra.ihr-ei.svc.cluster.local"
>>
>>
>>
>> concurrent_reads: 32
>>
>> concurrent_writes: 32
>>
>> concurrent_counter_writes: 32
>>
>>
>>
>> concurrent_materialized_view_writes: 32
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> disk_optimization_strategy: ssd
>>
>>
>>
>>
>>
>>
>>
>> memtable_allocation_type: heap_buffers
>>
>>
>>
>> commitlog_total_space_in_mb: 2048
>>
>>
>>
>>
>>
>> index_summary_capacity_in_mb:
>>
>>
>>
>> index_summary_resize_interval_in_minutes: 60
>>
>>
>>
>> trickle_fsync: false
>>
>> trickle_fsync_interval_in_kb: 10240
>>
>>
>>
>> storage_port: 7000
>>
>>
>>
>> ssl_storage_port: 7001
>>
>>
>>
>> listen_address: 10.130.7.245
>>
>>
>>
>> broadcast_address: 10.130.7.245
>>
>>
>>
>>
>>
>>
>>
>> start_native_transport: true
>>
>> native_transport_port: 9042
>>
>>
>>
>>
>>
>>
>>
>> start_rpc: true
>>
>>
>>
>> rpc_address: 0.0.0.0
>>
>>
>>
>> rpc_port: 9160
>>
>>
>>
>> broadcast_rpc_address: 10.130.7.245
>>
>>
>>
>> rpc_keepalive: true
>>
>>
>>
>> rpc_server_type: sync
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> thrift_framed_transport_size_in_mb: 15
>>
>>
>>
>> incremental_backups: false
>>
>>
>>
>> snapshot_before_compaction: false
>>
>>
>>
>> auto_snapshot: true
>>
>>
>>
>> tombstone_warn_threshold: 1000
>>
>> tombstone_failure_threshold: 10
>>
>>
>>
>> column_index_size_in_kb: 64
>>
>>
>>
>>
>>
>> batch_size_warn_threshold_in_kb: 5
>>
>>
>>
>> batch_size_fail_threshold_in_kb: 50
>>
>>
>>
>>
>>
>> compaction_throughput_mb_per_sec: 16
>>
>>
>>
>> compaction_large_partition_warning_threshold_mb: 100
>>
>>
>>
>> sstable_preemptive_open_interval_in_mb: 50
>>
>>
>>
>>
>>
>>
>>
>> read_request_timeout_in_ms: 5
>>
>> range_request_timeout_in_ms: 10
>>
>> write_request_timeout_in_ms: 2000