Re: [EXTERNAL] Apache Cassandra upgrade path

2019-07-27 Thread Romain Hardouin
 Hi,
Here are some upgrade options:  - Standard rolling upgrade: node by node    - 
Fast rolling upgrade: rack by rack.  If clients use CL=LOCAL_ONE then it's OK 
as long as one rack is UP. For higher CL it's possible assuming you have no 
more than one replica per rack e.g. CL=LOCAL_QUORUM with RF=3 and 2 racks is a 
*BAD* setup. But RF=3 with 3 rack is OK.   - Double write in another cluster: 
easy for short TTL data (e.g. TTL of few days) When possible, this option is 
not only the safest but also allows major change (e.g. Partitioner for legacy 
clusters). And of course it's a good opportunity to use new cloud instance 
type, change number of vnodes, etc.
As Sean said, it's not possible for C* servers to stream data with other 
versions when Streaming versions are different. There is no workaround.You can 
check that here 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/streaming/messages/StreamMessage.java#L35The
 community plans to work on this limitation to make streaming possible between 
different major versions starting from C*4.x
Last but not least, don't forget to take snapshots (+ backup) and to prepare a 
rollback script.System keyspace will be automatically snapshotted by Cassandra 
when the new version will start: the rollback script should be based on that 
snapshot for the system part.New data (both commitlog and sstables flushed in 
3.11 format) will be lost even with such a script but it's useful to test it 
and to have it ready for the D day.(See also snapshot_before_compaction setting 
but it might be useless depending on your procedure.)
Romain


Le vendredi 26 juillet 2019 à 23:51:52 UTC+2, Jai Bheemsen Rao Dhanwada 
 a écrit :  
 
 yes correct, it doesn't work for the servers. trying to see if any had any 
workaround for this issue? (may be changing the protocol version during the 
upgrade time?)

On Fri, Jul 26, 2019 at 1:11 PM Durity, Sean R  
wrote:


This would handle client protocol, but not streaming protocol between nodes.

 

 

Sean Durity – Staff Systems Engineer, Cassandra

 

From: Alok Dwivedi  
Sent: Friday, July 26, 2019 3:21 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Apache Cassandra upgrade path

 

Hi Sean

The recommended practice for upgrade is to explicitly control protocol version 
in your application during upgrade process. Basically the protocol version is 
negotiated on first connection and based on chance it can talk to an already 
upgraded node first which means it will negotiate a higher version that will 
not be compatible with those nodes which are still one lower Cassandra version. 
So initially you set it a lower version that is like lower common denominator 
for mixed mode cluster and then remove the call to explicit setting once 
upgrade has completed. 

 

Clustercluster= Cluster.builder()

   .addContactPoint("127.0.0.1")

   .withProtocolVersion(ProtocolVersion.V2)

   .build();

 

Refer here for more information if using Java driver

https://docs.datastax.com/en/developer/java-driver/3.7/manual/native_protocol/#protocol-version-with-mixed-clusters

 

Same thing applies to drivers in other languages. 

 

Thanks

Alok Dwivedi

Senior Consultant 

https://www.instaclustr.com/

 

 

On Fri, 26 Jul 2019 at 20:03, Jai Bheemsen Rao Dhanwada  
wrote:


Thanks Sean,

 

In my use case all my clusters are multi DC, and I am trying my best effort to 
upgrade ASAP, however there is a chance since all machines are VMs. Also my key 
spaces are not uniform across DCs. some are replicated to all DCs and some of 
them are just one DC, so I am worried there.

 

Is there a way to override the protocol version until the upgrade is done and 
then change it back once the upgrade is completed?

 

On Fri, Jul 26, 2019 at 11:42 AM Durity, Sean R  
wrote:


What you have seen is totally expected. You can’t stream between different 
major versions of Cassandra. Get the upgrade done, then worry about any down 
hardware. If you are using DCs, upgrade one DC at a time, so that there is an 
available environment in case of any disasters.

 

My advice, though, is to get through the rolling upgrade process as quickly as 
possible. Don’t stay in a mixed state very long. The cluster will function fine 
in a mixed state – except for those streaming operations. No repairs, no 
bootstraps. 

 

 

Sean Durity – Staff Systems Engineer, Cassandra

 

From: Jai Bheemsen Rao Dhanwada 
Sent: Friday, July 26, 2019 2:24 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Apache Cassandra upgrade path

 

Hello,

 

I am trying to upgrade Apache Cassandra from 2.1.16 to 3.11.3, the regular 
rolling upgrade process works fine without any issues.

 

However, I am running into an issue where if there is a node with older version 
dies (hardware failure) and a new node comes up and tries to bootstrap, it's 
failing.

 

I tried two combinations:

 

1. Joining replacement node with 2.1.16 version of cassandra 

In this case nodes

Re: Cluster configuration issue

2018-11-09 Thread Romain Hardouin
 128GB RAM -> that's a good news, you have plenty of room to increase Cassandra 
heap size. You can start with, let's say, 12GB in jvm.options or 24GB if you 
use G1 GC. Let us know if the node starts and if DEBUG/TRACE is useful. 
You can also try "strace -f -p ..." command to see if the process is doing 
something when it's stuck, but Cassandra has a lots of threads...
Le vendredi 9 novembre 2018 à 19:13:51 UTC+1, Francesco Messere 
 a écrit :  
 
  
Hi Roman 
 
 
yes  I modified the .yaml after the issue.
 
The problem  is this, if I restart a node in DC-FIRENZE than it not startup I 
tried first one node and then the second one
 
with the same results.
 

 
 
these are the server resources
 
memory 128Gb
 

 free
   total    used    free  shared  buff/cache   available
 Mem:  131741388    13858952    72649704  124584    45232732   116825040
 Swap:  16777212   0    16777212
 

 
 cpu 
 Architecture:  x86_64
 CPU op-mode(s):    32-bit, 64-bit
 Byte Order:    Little Endian
 CPU(s):    24
 On-line CPU(s) list:   0-23
 Thread(s) per core:    1
 Core(s) per socket:    12
 Socket(s): 2
 NUMA node(s):  2
 Vendor ID: GenuineIntel
 CPU family:    6
 Model: 79
 Model name:    Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
 Stepping:  1
 CPU MHz:   1213.523
 CPU max MHz:   2900.
 CPU min MHz:   1200.
 BogoMIPS:  4399.97
 Virtualization:    VT-x
 L1d cache: 32K
 L1i cache: 32K
 L2 cache:  256K
 L3 cache:  30720K
 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23 

 
 
There is nothing in server logs. 
 
 
On monday I will activate debug and try again to startup cassandra node 
 
 
Thanks
 
Francesco Messere
 

 
 

 
 
 On 09/11/2018 18:51, Romain Hardouin wrote:
  
 
  Ok so all nodes in Firenze are down. I thought only one was KO.  
  After a first look at cassandra.yaml the only issue I saw is seeds: the line 
you commented out was correct (one seed per DC). But I guess you modified it 
after the issue.  
  You should fix the swap issue.  
  Also can you add more heap to Cassandra? By the way, what are the specs of 
servers (RAM, CPU, etc)?  
  Did you check Linux system log? And Cassandra's debug.log? You can even 
enable TRACE logs in logback.xml ( 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/logback.xml#L100 
) then try to restart a node in Firenze to see where it blocks but if it's due 
to low resources, hardware issue or swap it won't be useful. Let's give a try 
anyway. 
  
  
  Le vendredi 9 novembre 2018 à 18:20:57 UTC+1, Francesco Messere 
 a écrit :  
  
 
Hi Romain,
 
you are right, is not possible to work in these towns furtunally I live in Pisa 
:-).
 
I sow the errors and corrected them, except the swap one.
 
The process stuks, I let it run for 1 day without results.
 
 This is the output of nodetool status from the nodes that are up and running 
(DC-MILANO)
 
 /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 status
 Datacenter: DC-FIRENZE
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 DN  192.168.204.175  ?  256  100.0%    
a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
 DN  192.168.204.176  ?  256  100.0%    
67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
 Datacenter: DC-MILANO
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 UN  192.168.71.210   5.95 GiB   256  100.0%    
210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
 UN  192.168.71.211   5.83 GiB   256  100.0%    
96c30edd-4e6c-4952-82d4-dfdf67f6a06f  RACK1
 
 and this is describecluster command output
 
/conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 describecluster
 Cluster Information:
     Name: CASSANDRA_3
     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
     DynamicEndPointSnitch: enabled
     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
     Schema versions:
     6bdd4617-658e-375e-8503-7158df833495: [192.168.71.210, 
192.168.71.211]
 
     UNREACHABLE: [192.168.204.175, 192.168.204.176]
 
 In attach the cassandra.yaml file 
 
 Regards
 Francesco Messere
 
 
 
  On 09/11/2018 17:48, Romain Hardouin wrote:
  
 
Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs 
Calcio Storico X-D 
  Ok more seriously, "Updating t

Re: Cluster configuration issue

2018-11-09 Thread Romain Hardouin
 Ok so all nodes in Firenze are down. I thought only one was KO. 
After a first look at cassandra.yaml the only issue I saw is seeds: the line 
you commented out was correct (one seed per DC). But I guess you modified it 
after the issue. 
You should fix the swap issue. 
Also can you add more heap to Cassandra? By the way, what are the specs of 
servers (RAM, CPU, etc)? 
Did you check Linux system log? And Cassandra's debug.log?You can even enable 
TRACE logs in logback.xml ( 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/logback.xml#L100 
) then try to restart a node in Firenze to see where it blocks but if it's due 
to low resources, hardware issue or swap it won't be useful. Let's give a try 
anyway.


Le vendredi 9 novembre 2018 à 18:20:57 UTC+1, Francesco Messere 
 a écrit :  
 
  
Hi Romain,
 
you are right, is not possible to work in these towns furtunally I live in Pisa 
:-).
 
I sow the errors and corrected them, except the swap one.
 
The process stuks, I let it run for 1 day without results.
 
 This is the output of nodetool status from the nodes that are up and running 
(DC-MILANO)
 
 /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 status
 Datacenter: DC-FIRENZE
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 DN  192.168.204.175  ?  256  100.0%    
a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
 DN  192.168.204.176  ?  256  100.0%    
67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
 Datacenter: DC-MILANO
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 UN  192.168.71.210   5.95 GiB   256  100.0%    
210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
 UN  192.168.71.211   5.83 GiB   256  100.0%    
96c30edd-4e6c-4952-82d4-dfdf67f6a06f  RACK1
 
 and this is describecluster command output
 
 /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 describecluster
 Cluster Information:
     Name: CASSANDRA_3
     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
     DynamicEndPointSnitch: enabled
     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
     Schema versions:
     6bdd4617-658e-375e-8503-7158df833495: [192.168.71.210, 
192.168.71.211]
 
     UNREACHABLE: [192.168.204.175, 192.168.204.176]
 
 In attach the cassandra.yaml file 
 
 Regards
 Francesco Messere
 
 
 
 On 09/11/2018 17:48, Romain Hardouin wrote:
  
 
  Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs Calcio 
Storico X-D 
  Ok more seriously, "Updating topology ..." is not a problem. But you have low 
resources and system misconfiguration: 
    - Small heap size: 3.867GiB  From the logs: "Unable to lock JVM memory 
(ENOMEM). This can result in part of the JVM being swapped out, especially with 
mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root." 
   - System settings: Swap shoud be disabled, bad system limits, etc.  From the 
logs: "Cassandra server running in degraded mode. Is swap disabled? : false,  
Address space adequate? : true,  nofile limit adequate? : true, nproc limit 
adequate? : false"For system tuning see 
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html 
  
  You said "Cassandra node did not startup". What is the problem exactly? The 
process is stuck or does it dies? What do you see with "nodetool status" on 
nodes that are up and running?  
  Btw cassandra-topology.properties is not required with 
GossipingPropertyFileSnitch (unless your are migratig from PropertyFileSnitch). 
  
  Best, 
  Romain 
  
  Le vendredi 9 novembre 2018 à 11:34:16 UTC+1, Francesco Messere 
 a écrit :  
  
 
Hi to all, 
 
 I have a problem with distribuited cluster configuration.
 This is a test environment  
 Cassandra version is 3.11.3
 2 site Milan and Florence
 2 servers on each site 
 
 1 common "cluster-name" and 2 DC 
 
 First installation and startup goes ok all the nodes are present in the 
cluster. 
 
 The issue startup after a server reboot in FLORENCE DC 
 
 Cassandra node did not startup and in system.log last line written is 
 
 INFO  [ScheduledTasks:1] 2018-11-09 10:36:54,306 TokenMetadata.java:498 - 
Updating topology for all endpoints that have changed
 
 
 
 The only way to correct the thing I found is to cleanup the node, remove from 
cluster and re-join it.
  
How can I solve it?
 

 
 
here are configuration files 
 
 
less cassandra-topology.properties 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the Licens

Re: Cluster configuration issue

2018-11-09 Thread Romain Hardouin
 Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs Calcio 
Storico X-D
Ok more seriously, "Updating topology ..." is not a problem. But you have low 
resources and system misconfiguration:
  - Small heap size: 3.867GiB From the logs: "Unable to lock JVM memory 
(ENOMEM). This can result in part of the JVM being swapped out, especially with 
mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root."
 - System settings: Swap shoud be disabled, bad system limits, etc. From the 
logs: "Cassandra server running in degraded mode. Is swap disabled? : false,  
Address space adequate? : true,  nofile limit adequate? : true, nproc limit 
adequate? : false" 
For system tuning see 
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

You said "Cassandra node did not startup". What is the problem exactly? The 
process is stuck or does it dies?What do you see with "nodetool status" on 
nodes that are up and running? 
Btw cassandra-topology.properties is not required with 
GossipingPropertyFileSnitch (unless your are migratig from PropertyFileSnitch).

Best,
Romain

Le vendredi 9 novembre 2018 à 11:34:16 UTC+1, Francesco Messere 
 a écrit :  
 
   
Hi to all, 
 
 I have a problem with distribuited cluster configuration.
 This is a test environment  
 Cassandra version is 3.11.3
 2 site Milan and Florence
 2 servers on each site 
 
 1 common "cluster-name" and 2 DC 
 
 First installation and startup goes ok all the nodes are present in the 
cluster. 
 
 The issue startup after a server reboot in FLORENCE DC 
 
 Cassandra node did not startup and in system.log last line written is 
 
 INFO  [ScheduledTasks:1] 2018-11-09 10:36:54,306 TokenMetadata.java:498 - 
Updating topology for all endpoints that have changed
 
 
 
 The only way to correct the thing I found is to cleanup the node, remove from 
cluster and re-join it.
  
How can I solve it?
 

 
 
here are configuration files 
 
 
less cassandra-topology.properties 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
 # Cassandra Node IP=Data Center:Rack
 192.168.204.175=DC-FIRENZE:RACK1
 192.168.204.176=DC-FIRENZE:RACK1
 192.168.71.210=DC-MILANO:RACK1
 192.168.71.211=DC-MILANO:RACK1
 
 # default for unknown nodes
 default=DC-FIRENZE:r1
 
 # Native IPv6 is supported, however you must escape the colon in the IPv6 
Address
 # Also be sure to comment out JVM_OPTS="$JVM_OPTS 
-Djava.net.preferIPv4Stack=true"
 # in cassandra-env.sh
 #fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3
 

 
 
cassandra-rackdc.properties
 
 
# Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
 # These properties are used with GossipingPropertyFileSnitch and will
 # indicate the rack and dc for this node
 dc=DC-FIRENZE
 rack=RACK1
 
 # Add a suffix to a datacenter name. Used by the Ec2Snitch and 
Ec2MultiRegionSnitch
 # to append a string to the EC2 region name.
 #dc_suffix=
 
 # Uncomment the following line to make this snitch prefer the internal ip when 
possible, as the Ec2MultiRegionSnitch does.
 # prefer_local=true
 
 

 
 
In attach the system.log file 
 
 
Regards 
 
 
Francesco Messere
 

 
 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org  

Re: Connections info

2018-10-05 Thread Romain Hardouin
 Note that one "user"/application can open multiple connections. You have also 
the number of Thrift connections available in JMX if you run a legacy 
application.
Max is right, regarding where they're come from you can use lsof. For instance 
on AWS - but you can adapt it for your needs:
IP=...REGION=...
ssh $IP "sudo lsof -i -n | grep 9042 | grep -Po '(?<=->)[^:]+' | sort -u" | 
xargs -P 20 -I '{}' aws --output json --region $REGION ec2 describe-instances 
--filter Name=private-ip-address,Values={} --query 
'Reservations[].Instances[*].Tags[*]' | jq '.[0][0] | map(select(.Key == 
"Name")) | .[0].Value' | sort | uniq -c
You'll have number of instances grouped by AWS name :      3 "name_ABC"     15 
"name_example"     37 "name_test"
Best,Romain
Le vendredi 5 octobre 2018 à 06:28:51 UTC+2, Max C. 
 a écrit :  
 
 Looks like the number of connections is available in JMX as:
org.apache.cassandra.metrics:type=Client,name=connectedNativeClients
http://cassandra.apache.org/doc/4.0/operating/metrics.html
"Number of clients connected to this nodes native protocol server”
As for where they’re coming from — I’m not sure how to get that from JMX.  
Maybe you’ll have to use “lsof” or something to get that. 
- Max


On Oct 4, 2018, at 8:57 pm, Abdul Patel  wrote:
Hi All,
Can we get number of users connected to each node in cassandra?Also can we get 
from whixh app node they are connecting from?

  

Re: Simple upgrade for outdated cluster

2018-08-03 Thread Romain Hardouin
 Also, you didn't mention which C*2.0 version you're using but prior to upgrade 
to 2.1.20, make sure to use the latest 2.0 - or at least >= 2.0.7
Le vendredi 3 août 2018 à 13:03:39 UTC+2, Romain Hardouin 
 a écrit :  
 
  Hi Joel,
No it's not supported. C*2.0 can't stream data to C*3.11. 
Make the upgrade 2.0 -> 2.1.20 then you'll be able to upgrade to 3.11.3 i.e. 
2.1.20 -> 3.11.3. You can upgrade to 3.0.17 as an intermediary step (I would 
do), but don't upgrade to 2.2. Also make sure to read carefully 
https://github.com/apache/cassandra/blob/cassandra-3.11/NEWS.txt It's a long 
read but it's important. There are lots of changes between all these versions.
Best,
RomainLe vendredi 3 août 2018 à 11:40:26 UTC+2, Joel Samuelsson 
 a écrit :  
 
 Hi,
We have a pretty outdated Cassandra cluster running version 2.0.x. Instead of 
doing step by step upgrades (2.0 -> 2.1, 2.1 -> 2.2, 2.2 -> 3.0, 3.0 -> 
3.11.x), would it be possible to add new nodes with a recent version (say 
3.11.x) and start decommissioning the old ones until we have a cluster with 
only 3.11.x?
Best regards,Joel 

Re: Simple upgrade for outdated cluster

2018-08-03 Thread Romain Hardouin
 Hi Joel,
No it's not supported. C*2.0 can't stream data to C*3.11. 
Make the upgrade 2.0 -> 2.1.20 then you'll be able to upgrade to 3.11.3 i.e. 
2.1.20 -> 3.11.3. You can upgrade to 3.0.17 as an intermediary step (I would 
do), but don't upgrade to 2.2. Also make sure to read carefully 
https://github.com/apache/cassandra/blob/cassandra-3.11/NEWS.txt It's a long 
read but it's important. There are lots of changes between all these versions.
Best,
RomainLe vendredi 3 août 2018 à 11:40:26 UTC+2, Joel Samuelsson 
 a écrit :  
 
 Hi,
We have a pretty outdated Cassandra cluster running version 2.0.x. Instead of 
doing step by step upgrades (2.0 -> 2.1, 2.1 -> 2.2, 2.2 -> 3.0, 3.0 -> 
3.11.x), would it be possible to add new nodes with a recent version (say 
3.11.x) and start decommissioning the old ones until we have a cluster with 
only 3.11.x?
Best regards,Joel   

Re: Rocksandra blog post

2018-03-06 Thread Romain Hardouin
 Rocksandra is very interesting for key/value data model. Let's hope it will 
land in C* upstream in the near future thanks to pluggable storage.Thanks 
Dikang!


Le mardi 6 mars 2018 à 10:06:16 UTC+1, Kyrylo Lebediev 
 a écrit :  
 
 #yiv7016643451 #yiv7016643451 -- P 
{margin-top:0;margin-bottom:0;}#yiv7016643451 
Thanks for sharing, Dikang! 


Impressive results. 





As you plugged in different storage engine, it's interesting how you're dealing 
with compactions in Rocksandra? 

Is there still the concept of immutable SSTables + compaction strategies or it 
was changed somehow?




Best, 


Kyrill


From: Dikang Gu 
Sent: Monday, March 5, 2018 8:26 PM
To: d...@cassandra.apache.org; cassandra
Subject: Rocksandra blog post As some of you already know, Instagram Cassandra 
team is working on the project to use RocksDB as Cassandra's storage engine. 
Today, we just published a blog post about the work we have done, and more 
excitingly, we published the benchmark metrics in AWS environment.
Check it out here: 
https://engineering.instagram.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latency-d64f86b43589

ThanksDikang

  

Re: What kind of Automation you have for Cassandra related operations on AWS ?

2018-02-08 Thread Romain Hardouin
 
At Teads we use Terraform, Chef, Packer and Rundeck for our AWS infrastructure. 
I'll publish a blog post on Medium which talk about that, it's in the pipeline. 
Terraform is awesome.
Best,
RomainLe vendredi 9 février 2018 à 00:57:01 UTC+1, Ben Wood 
 a écrit :  
 
 Shameless plug of our (DC/OS) Apache Cassandra service: 
https://docs.mesosphere.com/services/cassandra/2.0.3-3.0.14.
You must run DC/OS, but it will handle:RestartsReplacement of nodesModification 
of configurationBackups and Restores (to S3)
On Thu, Feb 8, 2018 at 3:46 PM, Krish Donald  wrote:

Hi All,
What kind of Automation you have for Cassandra related operations on AWS like 
restacking, restart of the cluster , changing cassandra.yaml parameters etc ?

Thanks




-- 
Ben WoodSoftware Engineer - Data AgilityMesosphere  

Re: Heavy one-off writes best practices

2018-02-06 Thread Romain Hardouin
 We use Spark2Cassandra (this fork works with C*3.0 
https://github.com/leoromanovsky/Spark2Cassandra )
SSTables are streamed to Cassandra by Spark2Cassandra (so you need to open port 
7000 accordingly).During benchmark we used 25 EMR nodes but in production we 
use less nodes to be more gentle with Cassandra.
Best,
Romain

Le mardi 6 février 2018 à 16:05:16 UTC+1, Julien Moumne 
 a écrit :  
 
 This does look like a very viable solution. Thanks.
Could you give us some pointers/documentation on : - how can we build such 
SSTables using spark jobs, maybe https://github.com/Netflix/sstable-adaptor ? - 
how do we send these tables to cassandra? does a simple SCP work? - what is the 
recommended size for sstables for when it does not fit a single executor
On 5 February 2018 at 18:40, Romain Hardouin  
wrote:

  Hi Julien,
We have such a use case on some clusters. If you want to insert big batches at 
fast pace the only viable solution is to generate SSTables on Spark side and 
stream them to C*. Last time we benchmarked such a job we achieved 1.3 million 
partitions inserted per seconde on a 3 C* nodes test cluster - which is 
impossible with regular inserts.
Best,
Romain
Le lundi 5 février 2018 à 03:54:09 UTC+1, kurt greaves 
 a écrit :  
 
 
Would you know if there is evidence that inserting skinny rows in sorted order 
(no batching) helps C*?
This won't have any effect as each insert will be handled separately by the 
coordinator (or a different coordinator, even). Sorting is also very unlikely 
to help even if you did batch.

 Also, in the case of wide rows, is there evidence that sorting clustering keys 
within partition batches helps ease C*'s job?
No evidence, seems very unlikely. ​  



-- 
Julien MOUMNÉ
Software Engineering - Data Science
Mail: jmoumne@deezer.com12 rue d'Athènes 75009 Paris - France  

Re: Heavy one-off writes best practices

2018-02-05 Thread Romain Hardouin
  Hi Julien,
We have such a use case on some clusters. If you want to insert big batches at 
fast pace the only viable solution is to generate SSTables on Spark side and 
stream them to C*. Last time we benchmarked such a job we achieved 1.3 million 
partitions inserted per seconde on a 3 C* nodes test cluster - which is 
impossible with regular inserts.
Best,
Romain
Le lundi 5 février 2018 à 03:54:09 UTC+1, kurt greaves 
 a écrit :  
 
 
Would you know if there is evidence that inserting skinny rows in sorted order 
(no batching) helps C*?
This won't have any effect as each insert will be handled separately by the 
coordinator (or a different coordinator, even). Sorting is also very unlikely 
to help even if you did batch.

 Also, in the case of wide rows, is there evidence that sorting clustering keys 
within partition batches helps ease C*'s job?
No evidence, seems very unlikely. ​  

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-05 Thread Romain Hardouin
  Hi,
We also noticed an increase of CPU - both system and user - on our c3.4xlarge 
fleet. So far it's really visible with max(%user) and especially max(%system), 
it has doubled!I graphed a ratio "write/s / %system", it's interesting to see 
how the value dropped yesterday, you can see it here: https://ibb.co/dnVcHG
For reference: 
https://aws.amazon.com/fr/security/security-bulletins/AWS-2018-013/
Best,
Romain

Le vendredi 5 janvier 2018 à 13:09:35 UTC+1, Steinmaurer, Thomas 
 a écrit :  
 
  
Hello,
 
  
 
has anybody already some experience/results if a patched Linux kernel regarding 
Meltdown/Spectre is affecting performance of Cassandra negatively?
 
  
 
In production, all nodes running in AWS with m4.xlarge, we see up to a 50% 
relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4, 2018, most 
likely correlating with Amazon finished patching the underlying Hypervisor 
infrastructure …
 
  
 
Anybody else seeing a similar CPU increase?
 
  
 
Thanks,
 
Thomas
 
  
 The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313  

Re: unrecognized column family in logs

2017-11-09 Thread Romain Hardouin
 Does "nodetool describecluster" shows an actual schema disagreement?You can 
try "nodetool resetlocalschema" to fix the issue on the node experiencing 
disagreement.
Romain
Le jeudi 9 novembre 2017 à 02:55:22 UTC+1, Erick Ramirez 
 a écrit :  
 
 It looks like you have a schema disagreement in your cluster which you need to 
look into.
And you're right since that column family ID is equivalent to Friday, June 24, 
2016 10:14:49 AM PDT.
Have a look at the table IDs in system.schema_columnfamilies for clues. Cheers!
On Thu, Nov 9, 2017 at 4:50 AM, Anubhav Kale 
 wrote:


Hello,

 

We run Cassandra 2.1.13 and since last few days we’re seeing below in logs 
occasionally. The node then becomes unresponsive to cqlsh.

 

ERROR [SharedPool-Worker-2] 2017-11-08 17:02:32,362 CommitLogSegment.java:441 - 
Attempted to write commit log entry for unrecognized column family: 
2854d160-3a2f-11e6-925c- b143135bdc80

 

https://github.com/mariusae/ cassandra/blob/master/src/ 
java/org/apache/cassandra/db/ commitlog/CommitLogSegment. java#L95

 

The column family has heavy writes, but it hasn’t changed schema / load wise 
recently. How can this be troubleshooted / fixed ?

 

Thanks !

 

 

 


  

Re: How to do cassandra routine maintenance

2017-09-08 Thread Romain Hardouin
 Hi,
You should read about repair maintenance: 
http://cassandra.apache.org/doc/latest/operating/repair.htmlConsider installing 
and running C* reaper to do so: http://cassandra-reaper.io/STCS doesn't work 
well with TTL. I saw you have done some tuning, hard to say if it's OK without 
knowing the workload.LCS is better for TTL (but requires fast disks) and if 
you're working with time series consider TWCS.If CPU are not overloaded you can 
also consider Snappy compression (btw check compression ratio).Again depending 
on your data model and your queries, chunk_length_in_kb might be increased to 
have a more effective compression (generally speaking we tend to lower it to 
improve read latency).
Best,
Romain
Le samedi 2 septembre 2017 à 04:17:22 UTC+2, qf zhou  
a écrit :  
 
 I am using the cluster with 3 cassandra  nodes, the cluster version is 3.0.9. 
Each day about 200~300 million records are inserted into the cluster.
As time goes by,  more and more data occupied more and more disk space. 
Currently,    the data distribution  on each node is  as  the following:

UN  172.20.5.4  2.5 TiB    256          66.3%            
c5271e74-19a1-4cee-98d7-dc169cf87e95  rack1
UN  172.20.5.2  1.73 TiB  256          67.0%            
c623bbc0-9839-4d2d-8ff3-db7115719d59  rack1
UN  172.20.5.3  1.86 TiB  256          66.7%            
c555e44c-9590-4f45-aea4-f5eca68180b2  rack1 

There is only one datacenter.  

The compaciton strategy is here:
    compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '12', 'tombstone_threshold': '0.1', 
'unchecked_tombstone_compaction': 'true'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 864
    AND gc_grace_seconds = 432000

I really want to know  about how to do cassandra routine maintenance ?

I found the data seems to grow faster  and  the disk is in heavy load. 
Sometimes I found the data inconsistency: two different results appear with the 
same query.

So what I shoud I do to keep the cluster healthy,  how to maintain the cluster?

I hope  some help  very much!  Thanks a lot ! 



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org
  

Re: Cassandra 3.11 is compacting forever

2017-09-08 Thread Romain Hardouin
Hi,
It might be useful to enable compaction logging with log_all subproperties.
Best,
Romain 
Le vendredi 8 septembre 2017 à 00:15:19 UTC+2, kurt greaves 
 a écrit :  
 
 Might be worth turning on debug logging for that node and when the compaction 
kicks off and CPU skyrockets send through the logs.​

Re: Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread Romain Hardouin
Hi,
Before: 1 cluster with 2 DC. 3 nodes in each DCNow: 1 cluster with 1 DC. 6 
nodes in this DC
Is it right?
If yes, depending on the RF - and assuming NetworkTopologyStrategy - I would 
do: - RF = 2  => 2 C* rack, one rack in each AZ - RF = 3  => 3 C* rack, one 
rack in each AZ
In other words, I would align C* rack and AZ.Note that AWS charges for inter AZ 
traffic a.k.a Regional Data Transfer.
Best,
Romain 

Le Mardi 7 mars 2017 18h36, tommaso barbugli  a écrit :
 

 Hi Richard,
It depends on the snitch and the replication strategy in use.
Here's a link to a blogpost about how we deployed C* on 3AZ
http://highscalability.com/blog/2016/8/1/how-to-setup-a-highly-available-multi-az-cassandra-cluster-o.html

Best,Tommaso 


On Mar 7, 2017 18:05, "Ney, Richard"  wrote:

We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node 
Cassandra cluster split between two AWS availability zones. Are there any 
behaviors we need to take into account to ensure the Cassandra cluster 
stability with this configuration? RICHARD NEYTECHNICAL DIRECTOR, RESEARCH & 
DEVELOPMENTUNITED statesrichard@aspect.comaspect.com This email (including 
any attachments) is proprietary to Aspect Software, Inc. and may contain 
information that is confidential. If you have received this message in error, 
please do not read, copy or forward this message. Please notify the sender 
immediately, delete it from your system and destroy any copies. You may not 
further disclose or distribute this email or its attachments.



   

Re: Attached profiled data but need help understanding it

2017-03-06 Thread Romain Hardouin
Hi Kant,
You'll find more information about ixgbevf here 
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sriov-networking.htmlI 
repeat myself but don't underestimate VMs placement: same AZ? same placement 
group? etc.Note that LWT are not discouraged but as the doc says: "[...] 
reserve lightweight transactions for those situations where they are absolutely 
necessary;"I hope you'll be able to achieve what you want with more powerful 
VMs. Let us know!
Best,Romain
 

Le Lundi 6 mars 2017 10h49, Kant Kodali  a écrit :
 

 Hi Romain,
We may be able to achieve what we need without LWT but that would require bunch 
of changes from the application side and possibly introducing caching layers 
and designing solution around that. But for now, we are constrained to use 
LWT's for another month or so. All said, I still would like to see the 
discouraged features such as LWT's, secondary indexes, triggers get better over 
time so it would really benefit users.
Agreed High park/unpark is a sign of excessive context switching but any ideas 
why this is happening? yes today we will be experimenting with c3.2Xlarge and 
see what the numbers look like and slowly scale up from there.
How do I make sure I install  ixgbevf driver? Do M4.xlarge or C3.2Xlarge don't 
already have it? when I googled " ixgbevf driver" it tells me it is ethernet 
driver...I thought all instances by default run on ethernet on AWS. can you 
please give more context on this?
Thanks,kant
On Fri, Mar 3, 2017 at 4:42 AM, Romain Hardouin  wrote:

Also, I should have mentioned that it would be a good idea to spawn your three 
benchmark instances in the same AZ, then try with one instance on each AZ to 
see how network latency affects your LWT rate. The lower latency is achievable 
with three instances on the same placement group of course but it's kinda 
dangerous for production. 





   

Re: question of keyspace that just disappeared

2017-03-03 Thread Romain Hardouin
I suspect a lack of 3.x reliability. Cassandra could had gave up with dropped 
messages but not with a "drop keyspace". I mean I already saw some spark jobs 
with too much executors that produce a high load average on a DC. I saw a C* 
node with a 1 min. load avg of 140 that can still have a P99 read latency at 
40ms. But I never saw a disappearing keyspace. There are old tickets regarding 
C* 1.x but as far as I remember it was due to a create/drop/create keyspace.

Le Vendredi 3 mars 2017 13h44, George Webster  a écrit 
:
 

 Thank you for your reply and good to know about the debug statement. I haven't 
 
We never dropped or re-created the keyspace before. We haven't even performed 
writes to that keyspace in months. I also checked the permissions of Apache, 
that user had read only access. 
Unfortunately, I reverted from a backend recently. I cannot say for sure 
anymore if I saw something in system before the revert. 
Anyway, hopefully it was just a fluke. We have some crazy ML libraries running 
on it maybe Cassandra just gave up? Ohh well, Cassandra is a a champ and we 
haven't really had issues with it before. 
On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin  wrote:

Did you inspect system tables to see if there is some traces of your keyspace? 
Did you ever drop and re-create this keyspace before that?
Lines in debug appear because fd interval is > 2 seconds (logs are in 
nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_ ms 
and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't have 
these lines in debug logs before? I used to see them a lot prior to increase 
intervals to 4 seconds. 
Best,
Romain

Le Mardi 28 février 2017 18h25, George Webster  a 
écrit :
 

 Hey Cassandra Users,
We recently encountered an issue with a keyspace just disappeared. I was 
curious if anyone has had this occur before and can provide some insight. 
We are using cassandra 3.10. 2 DCs  3 nodes each. The data was still located in 
the storage folder but is not located inside Cassandra
I searched the logs for any hints of error or commands being executed that 
could have caused a loss of a keyspace. Unfortunately I found nothing. In the 
logs the only unusual issue i saw was a series of read timeouts that occurred 
right around when the keyspace went away. Since then I see numerous entries in 
debug log as the following:
DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 - 
Ignoring interval time of 2155674599 for /x.x.x..12DEBUG [GossipStage:1] 
2017-02-28 18:14:16,580 FailureDetector.java:457 - Ignoring interval time of 
2945213745 for /x.x.x.81DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 
FailureDetector.java:457 - Ignoring interval time of 2006530862 for 
/x.x.x..69DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 
FailureDetector.java:457 - Ignoring interval time of 3441841231 for 
/x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 
- Ignoring interval time of 2153964846 for /x.x.x.82DEBUG [GossipStage:1] 
2017-02-28 18:14:33,582 FailureDetector.java:457 - Ignoring interval time of 
2588593281 for /x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 
FailureDetector.java:457 - Ignoring interval time of 2005305693 for 
/x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 
- Ignoring interval time of 2009244850 for /x.x.x.82DEBUG [GossipStage:1] 
2017-02-28 18:14:43,584 FailureDetector.java:457 - Ignoring interval time of 
2149192677 for /x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 
FailureDetector.java:457 - Ignoring interval time of 2021180918 for 
/x.x.x.85DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 
- Ignoring interval time of 2436026101 for /x.x.x.81DEBUG [GossipStage:1] 
2017-02-28 18:14:46,432 FailureDetector.java:457 - Ignoring interval time of 
2436187894 for /x.x.x.82
During the time of the disappearing keyspace we had two concurrent 
activities:1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a 
countbykey. It was using they Keyspace that disappeared. The operation 
crashed.2) We created a new keyspace to test out scheme. Only "fancy" thing in 
that keyspace are a few material view tables. Data was being loaded into that 
keyspace during the crash. The load process was extracting information and then 
just writing to Cassandra. 
Any ideas? Anyone seen this before?
Thanks,George

   



   

Re: Attached profiled data but need help understanding it

2017-03-03 Thread Romain Hardouin
Also, I should have mentioned that it would be a good idea to spawn your three 
benchmark instances in the same AZ, then try with one instance on each AZ to 
see how network latency affects your LWT rate. The lower latency is achievable 
with three instances on the same placement group of course but it's kinda 
dangerous for production. 



Re: Attached profiled data but need help understanding it

2017-03-02 Thread Romain Hardouin
Hi Kant,
> By backporting you mean I should cherry pick CASSANDRA-11966 commit and 
> compile from source?
Yes
Regarding the network utilization: you checked throughput but latency is more 
important for LWT. That's why you should make sure your m4 instances (both C* 
and client) are using ixgbevf driver.
I agree 1500 writes/s is not impressive but 4 vCPU is low. It depends on the 
workload but my experience is that an AWS instance become to be powerful with 
16 vCPUs (e.g. c3.4xlarge). And beware of EBS (again, that's my experience 
YMMV).
High park/unpark is a sign of excessive context switching. If I were you I 
would make a LWT benchmark with 3 x c3.4xlarge or c3.8xlarge (32 vCPUs, SSD 
instance store). Spawn spot instances to save money and be sure to tune 
cassandra.yaml accordingly e.g. concurrent_writes.
Finally, a naive question but I must ask you... are you really sure you need 
LWT? Can't you achieve your goal without it?

 Best,
Romain

Le Jeudi 2 mars 2017 10h31, Kant Kodali  a écrit :
 

 Hi Romain,
Any ideas on this? I am not sure why there is so much time being spent in Park 
and Unpark methods as produced by thread dump? Also, could you please look into 
my responses from other email? It would greatly help.
Thanks,kant
On Tue, Feb 28, 2017 at 10:20 PM, Kant Kodali  wrote:

Hi Romain,
I am using Cassandra version 3.0.9 and here is the generated report  (Graphical 
view) of my thread dump as well!. Just send this over in case if it helps.
Thanks,kant
On Tue, Feb 28, 2017 at 7:51 PM, Kant Kodali  wrote:

Hi Romain,
Thanks again. My response are inline.
kant

On Tue, Feb 28, 2017 at 10:04 AM, Romain Hardouin  wrote:

> we are currently using 3.0.9.  should we use 3.8 or 3.10
No, don't use 3.X in production unless you really need a major feature.I would 
advise to stick to 3.0.X (i.e. 3.0.11 now).You can backport CASSANDRA-11966 
easily but of course you have to deploy from source as a prerequisite.

   By backporting you mean I should cherry pick CASSANDRA-11966 commit and 
compile from source?

> I haven't done any tuning yet.
So it's a good news because maybe there is room for improvement
> Can I change this on a running instance? If so, how? or does it require a 
> downtime?
You can throttle compaction at runtime with "nodetool setcompactionthroughput". 
Be sure to read all nodetool commmands, some of them are really useful for a 
day to day tuning/management. 
If GC is fine, then check other things -> "[...] different pool sizes for NTR, 
concurrent reads and writes, compaction executors, etc. Also check if you can 
improve network latency (e.g. VF or ENA on AWS)."
Regarding thread pools, some of them can be resized at runtime via JMX.
> 5000 is the target.
Right now you reached 1500. Is it per node or for the cluster?We don't know 
your setup so it's hard to say it's doable. Can you provide more details? VM, 
physical nodes, #nodes, etc.Generally speaking LWT should be seldom used. AFAIK 
you won't achieve 10,000 writes/s per node.
Maybe someone on the list already made some tuning for heavy LWT workload?

    1500 total cluster.  
    I have a 8 node cassandra cluster. Each node is AWS m4.xlarge instance (so 
4 vCPU, 16GB, 1Gbit network=125MB/s)
    I have 1 node (m4.xlarge) for my application which just inserts a bunch of 
data and each insert is an LWT
 
    I tested the network throughput of the node.  I can get up 98 MB/s.
    Now, when I start my application. I see that Cassandra nodes Receive rate/ 
throughput is about 4MB/s (yes it is in Mega Bytes. I checked this by running 
sudo iftop -B). The Disk I/O is also same and the Cassandra process CPU usage 
is about 360% (the max is 400% since it is a 4 core machine). The application 
node transmission throughput is about 6MB/s. so even with 4MB/s receive 
throughput at Cassandra node the CPU is almost maxed out. I am not sure what 
this says about Cassandra? But, what I can tell is that Network is way 
underutilized and that 8 nodes are unnecessary so we plan to bring it down to 4 
nodes except each node this time will have 8 cores. All said, I am still not 
sure how to scale up from 1500 writes/sec?       

Best,
Romain








   

Re: question of keyspace that just disappeared

2017-03-02 Thread Romain Hardouin
Did you inspect system tables to see if there is some traces of your keyspace? 
Did you ever drop and re-create this keyspace before that?
Lines in debug appear because fd interval is > 2 seconds (logs are in 
nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_ms 
and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't have 
these lines in debug logs before? I used to see them a lot prior to increase 
intervals to 4 seconds. 
Best,
Romain

Le Mardi 28 février 2017 18h25, George Webster  a 
écrit :
 

 Hey Cassandra Users,
We recently encountered an issue with a keyspace just disappeared. I was 
curious if anyone has had this occur before and can provide some insight. 
We are using cassandra 3.10. 2 DCs  3 nodes each. The data was still located in 
the storage folder but is not located inside Cassandra
I searched the logs for any hints of error or commands being executed that 
could have caused a loss of a keyspace. Unfortunately I found nothing. In the 
logs the only unusual issue i saw was a series of read timeouts that occurred 
right around when the keyspace went away. Since then I see numerous entries in 
debug log as the following:
DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 - 
Ignoring interval time of 2155674599 for /x.x.x..12DEBUG [GossipStage:1] 
2017-02-28 18:14:16,580 FailureDetector.java:457 - Ignoring interval time of 
2945213745 for /x.x.x.81DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 
FailureDetector.java:457 - Ignoring interval time of 2006530862 for 
/x.x.x..69DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 
FailureDetector.java:457 - Ignoring interval time of 3441841231 for 
/x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 
- Ignoring interval time of 2153964846 for /x.x.x.82DEBUG [GossipStage:1] 
2017-02-28 18:14:33,582 FailureDetector.java:457 - Ignoring interval time of 
2588593281 for /x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 
FailureDetector.java:457 - Ignoring interval time of 2005305693 for 
/x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 
- Ignoring interval time of 2009244850 for /x.x.x.82DEBUG [GossipStage:1] 
2017-02-28 18:14:43,584 FailureDetector.java:457 - Ignoring interval time of 
2149192677 for /x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 
FailureDetector.java:457 - Ignoring interval time of 2021180918 for 
/x.x.x.85DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 
- Ignoring interval time of 2436026101 for /x.x.x.81DEBUG [GossipStage:1] 
2017-02-28 18:14:46,432 FailureDetector.java:457 - Ignoring interval time of 
2436187894 for /x.x.x.82
During the time of the disappearing keyspace we had two concurrent 
activities:1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a 
countbykey. It was using they Keyspace that disappeared. The operation 
crashed.2) We created a new keyspace to test out scheme. Only "fancy" thing in 
that keyspace are a few material view tables. Data was being loaded into that 
keyspace during the crash. The load process was extracting information and then 
just writing to Cassandra. 
Any ideas? Anyone seen this before?
Thanks,George

   

Re: AWS NVMe i3 instances performances

2017-03-01 Thread Romain Hardouin
Thanks for your feedback Daemeon!I'm a disappointed and I hope that some system 
settings will allow to leverage NVMe :-/What i3 instances did you 
benchmarked?Did you have a "preview access" to i3? Or was it available in a 
specific region before the announcement?
Best,Romain 

Le Mercredi 1 mars 2017 17h44, daemeon reiydelle  a 
écrit :
 

 We did. Found that, even with (CentOS, Ubuntu both for application 
compatibility reasons) that there is somewhat less IO and better CPU throughput 
at the price point. At the time my optimization work for that client ended, 
Amazon was looking at the IO issue, as perhaps the frame configurations needed 
further optimization. this was 2 months ago. A very superficial (no kernel 
tuning) done last month seems to indicate the same tradeoffs. Testing was 
performed in both cases with C* stress tool and with CI test suites. Does this 
help?


...

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Wed, Mar 1, 2017 at 3:30 AM, Romain Hardouin  wrote:

Hi all,
AWS launched i3 instances few days ago*. NVMe SSDs seem very promising!
Did someone already benchmark an i3 with Cassandra? e.g. i2 vs i3If yes, with 
which OS and kernel version?Did you make any system tuning for NVMe? e.g. PCIe 
IRQ? etc.
We plan to make some benchmarks but Debian is not listed as a supported OS so 
we have to upgrade our kernel and see if it works :PHere is what we have in 
mind for the time being:* OS: Debian* Kernel: v4.9* IRQ: try several 
configurationsAlso I would like to compare performances between our Debian AMI 
and a standard AWS Linux AMI.
Thanks!
[*] https://aws.amazon.com/fr/ blogs/aws/now-available-i3- 
instances-for-demanding-io- intensive-applications/





   

AWS NVMe i3 instances performances

2017-03-01 Thread Romain Hardouin
Hi all,
AWS launched i3 instances few days ago*. NVMe SSDs seem very promising!
Did someone already benchmark an i3 with Cassandra? e.g. i2 vs i3If yes, with 
which OS and kernel version?Did you make any system tuning for NVMe? e.g. PCIe 
IRQ? etc.
We plan to make some benchmarks but Debian is not listed as a supported OS so 
we have to upgrade our kernel and see if it works :PHere is what we have in 
mind for the time being:* OS: Debian* Kernel: v4.9* IRQ: try several 
configurationsAlso I would like to compare performances between our Debian AMI 
and a standard AWS Linux AMI.
Thanks!
[*] 
https://aws.amazon.com/fr/blogs/aws/now-available-i3-instances-for-demanding-io-intensive-applications/



Re: Attached profiled data but need help understanding it

2017-02-28 Thread Romain Hardouin
> we are currently using 3.0.9.  should we use 3.8 or 3.10
No, don't use 3.X in production unless you really need a major feature.I would 
advise to stick to 3.0.X (i.e. 3.0.11 now).You can backport CASSANDRA-11966 
easily but of course you have to deploy from source as a prerequisite.
> I haven't done any tuning yet.
So it's a good news because maybe there is room for improvement
> Can I change this on a running instance? If so, how? or does it require a 
> downtime?
You can throttle compaction at runtime with "nodetool setcompactionthroughput". 
Be sure to read all nodetool commmands, some of them are really useful for a 
day to day tuning/management. 
If GC is fine, then check other things -> "[...] different pool sizes for NTR, 
concurrent reads and writes, compaction executors, etc. Also check if you can 
improve network latency (e.g. VF or ENA on AWS)."
Regarding thread pools, some of them can be resized at runtime via JMX.
> 5000 is the target.
Right now you reached 1500. Is it per node or for the cluster?We don't know 
your setup so it's hard to say it's doable. Can you provide more details? VM, 
physical nodes, #nodes, etc.Generally speaking LWT should be seldom used. AFAIK 
you won't achieve 10,000 writes/s per node.
Maybe someone on the list already made some tuning for heavy LWT workload?
Best,
Romain


Re: Attached profiled data but need help understanding it

2017-02-27 Thread Romain Hardouin
Hi,
Regarding shared pool workers see CASSANDRA-11966. You may have to backport it 
depending on your Cassandra version. 
Did you try to lower compaction throughput to see if it helps? Be sure to keep 
an eye on pending compactions, SSTables count and SSTable per read of course.
"alloc" is the memory allocation rate. You can see that compactions are GC 
intensive.
You won't be able to achieve impressive writes/s with LWT. But maybe there is 
room for improvement. Try GC tuning, different pool sizes for NTR, concurrent 
reads and writes, compaction executors, etc. Also check if you can improve 
network latency (e.g. VF or ENA on AWS).
What LWT rate would you want to achieve?
Best,
Romain
 

Le Lundi 27 février 2017 12h48, Kant Kodali  a écrit :
 

 Also Attached is a flamed graph generated from a thread dump.
On Mon, Feb 27, 2017 at 2:32 AM, Kant Kodali  wrote:

Hi,
Attached are the stats of my Cassandra node running on a 4-core CPU. I am using 
sjk-plus tool for the first time so what are the things I should watched out 
for in my attached screenshot? I can see the CPU is almost maxed out but should 
I say that is because of compaction or shared-worker-pool threads (which btw, I 
dont know what they are doing perhaps I need to take threadump)? Also what is 
alloc for each thread? 
I have a insert heavy workload (almost like an ingest running against cassandra 
cluster) and in my case all writes are LWT.
The current throughput is 1500 writes/sec where each write is about 1KB. How 
can I tune something for a higher throughput? Any pointers or suggestions would 
help.

Thanks much,kant



   

Re: secondary index on static column

2017-02-27 Thread Romain Hardouin
Hi,
Sorry for the delay, I created a ticket with steps to reproduce the issue: 
https://issues.apache.org/jira/browse/CASSANDRA-13277
Best,
Romain
 

Le Jeudi 2 février 2017 16h53, Micha  a écrit :
 

 Hi,

it's a 3.9, installed on a jessie system.

For me it's like this:
I have a three node cluster.
When creating the keyspace with replication factor 3 it works.
When creating the keyspace with replication factor 2 it doesn't work and
shows the weird behavior.

This is a fresh install, I also have tried it multiple times and the
result is the same. As SASI indices work, I use those.
But I would like to solve this.

Cheers,
 Michael





On 02.02.2017 15:06, Romain Hardouin wrote:
> Hi,
> 
> What's your C* 3.X version?
> I've just tested it on 3.9 and it works:
> 
> cqlsh> SELECT * FROM test.idx_static where id2=22;
> 
>  id  | added                          | id2 | source | dest
> -+-+-++--
>  id1 | 2017-01-27 23:00:00.00+ |  22 |  src1 | dst1
> 
> (1 rows)
> 
> Maybe your dataset is incorrect (try on a new table) or you hit a bug.
> 
> Best,
> 
> Romain
> 
> 
> 
> Le Vendredi 27 janvier 2017 9h44, Micha  a écrit :
> 
> 
> Hi,
> 
> I'm quite new to cassandra and allthough there is much info on the net,
> sometimes I cannot find the solution to a problem.
> 
> In this case, I have a second index on a static column and I don't
> understand the answer I get from my select.
> 
> A cut down version of the table is:
> 
> create table demo (id text, id2 bigint static, added timestamp, source
> text static, dest text, primary key (id, added));
> 
> create index on demo (id2);
> 
> id and id2 match one to one.
> 
> I make one insert:
> insert into demo (id, id2, added, source, dest) values ('id1', 22,
> '2017-01-28', 'src1', 'dst1');
> 
> 
> The "select from demo;" gives the expected answer of the one inserted row.
> 
> But "select from demo where id2=22" gives 70 rows as result (all the same).
> 
> Why? I have read
> https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive
> 
> but I don't get it...
> 
> thanks for answering,
> Michael
> 
> 
> 


   

Re: secondary index on static column

2017-02-02 Thread Romain Hardouin
Hi,
What's your C* 3.X version?I've just tested it on 3.9 and it works:
cqlsh> SELECT * FROM test.idx_static where id2=22;
 id  | added                           | id2 | source | 
dest-+-+-++-- id1 | 
2017-01-27 23:00:00.00+ |  22 |   src1 | dst1
(1 rows)
Maybe your dataset is incorrect (try on a new table) or you hit a bug.
Best,
Romain
 

Le Vendredi 27 janvier 2017 9h44, Micha  a écrit :
 

 Hi,

I'm quite new to cassandra and allthough there is much info on the net,
sometimes I cannot find the solution to a problem.

In this case, I have a second index on a static column and I don't
understand the answer I get from my select.

A cut down version of the table is:

create table demo (id text, id2 bigint static, added timestamp, source
text static, dest text, primary key (id, added));

create index on demo (id2);

id and id2 match one to one.

I make one insert:
insert into demo (id, id2, added, source, dest) values ('id1', 22,
'2017-01-28', 'src1', 'dst1');


The "select from demo;" gives the expected answer of the one inserted row.

But "select from demo where id2=22" gives 70 rows as result (all the same).

Why? I have read
https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive

but I don't get it...

thanks for answering,
 Michael



   

Re: Global TTL vs Insert TTL

2017-02-02 Thread Romain Hardouin
Default TTL is nice to provide information on tables for ops guys. I mean we 
know that data in such tables are ephemeral at a glance. 

Le Mercredi 1 février 2017 21h47, Carlos Rolo  a écrit :
 

 Awsome to know this!

Thanks Jon and DuyHai!

Regards,

Carlos Juzarte RoloCassandra Consultant / Datastax Certified Architect / 
Cassandra MVP
 Pythian - Love your data
rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin: 
linkedin.com/in/carlosjuzarterolo 
Mobile: +351 918 918 100 
www.pythian.com
On Wed, Feb 1, 2017 at 6:57 PM, Jonathan Haddad  wrote:

The optimization is there.  The entire sstable can be dropped but it's not 
because of the default TTL.  The default TTL only applies if a TTL isn't 
specified explicitly.  The default TTL can't be used to drop a table 
automatically since it can be overridden at insert time.  Check out this 
example.  The first insert uses the default TTL.  The second insert overrides 
the default.  Using the default TTL to drop the sstable would be pretty 
terrible in this case:
CREATE TABLE test.b (    k int PRIMARY KEY,    v int) WITH default_time_to_live 
= 1;
insert into b (k, v) values (1, 1);
cqlsh:test> select k, v, TTL(v) from b  where k = 1;
 k | v | ttl(v)---+---+ 1 | 1 |   9943
(1 rows)
cqlsh:test> insert into b (k, v) values (2, 1) USING TTL ;cqlsh:test> 
select k, v, TTL(v) from b  where k = 2;
 k | v | ttl(v)---+---+-- 2 | 1 | 9995
(1 rows)
TL;DR: The default TTL is there as a convenience so you don't have to keep the 
TTL in your code.  From a performance perspective it does not matter.
Jon

On Wed, Feb 1, 2017 at 10:39 AM DuyHai Doan  wrote:

I was referring to this JIRA https://issues.apache. 
org/jira/browse/CASSANDRA-3974 when talking about dropping entire SSTable at 
compaction time
But the JIRA is pretty old and it is very possible that the optimization is no 
longer there



On Wed, Feb 1, 2017 at 6:53 PM, Jonathan Haddad  wrote:

This is incorrect, there's no optimization used that references the table level 
TTL setting.   The max local deletion time is stored in table metadata.  See 
org.apache.cassandra.io. sstable.metadata. StatsMetadata# maxLocalDeletionTime 
in the Cassandra 3.0 branch.    The default ttl is stored here: 
org.apache.cassandra. schema.TableParams# defaultTimeToLive and is never 
referenced during compaction.
Here's an example from a table I created without a default TTL, you can use the 
sstablemetadata tool to see:
jhaddad@rustyrazorblade ~/dev/cassandra/data/data/ test$ ../../../tools/bin/ 
sstablemetadata a- 7bca6b50e8a511e6869a5596edf4dd 
35/mc-1-big-Data.db.SSTable max local deletion time: 1485980862
On Wed, Feb 1, 2017 at 6:59 AM DuyHai Doan  wrote:

Global TTL is better than dynamic runtime TTL
Why ?
 Because Global TTL is a table property and Cassandra can perform optimization 
when compacting.
For example if it can see than the maxTimestamp of an SSTable is older than the 
table Global TTL, the SSTable can be entirely dropped during compaction
Using dynamic TTL at runtime, since Cassandra doesn't how and cannot track each 
individual TTL value, the previous optimization is not possible (even if you 
always use the SAME TTL for all query, Cassandra is not supposed to know that)


On Wed, Feb 1, 2017 at 3:01 PM, Cogumelos Maravilha 
 wrote:

  Thank you all, for your answers.
  
 On 02/01/2017 01:06 PM, Carlos Rolo wrote:
  
 To reinforce Alain statement:
 
 "I would say that the unsafe part is more about using C* 3.9" this is key. You 
would be better on 3.0.x unless you need features on the 3.x series.
 
Regards,
  
  Carlos Juzarte Rolo Cassandra Consultant / Datastax Certified Architect / 
Cassandra MVP
    Pythian - Love your data 
  rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin: linkedin.com/in/ 
carlosjuzarterolo 
  Mobile: +351 918 918 100 
  www.pythian.com
 On Wed, Feb 1, 2017 at 8:32 AM, Alain RODRIGUEZ  wrote:
 
  
Is it safe to use TWCS in C* 3.9?
 
  I would say that the unsafe part is more about using C* 3.9 than using TWCS 
in C*3.9 :-). I see no reason to say 3.9 would be specifically unsafe in C*3.9, 
but I might be missing something. 
  Going from STCS to TWCS is often smooth, from LCS you might expect an extra 
load compacting a lot (all?) of the SSTable from what we saw from the field. In 
this case, be sure that your compaction options are safe enough to handle this. 
  TWCS is even easier to use on C*3.0.8+ and C*3.8+ as it became the new 
default replacing TWCS, so no extra jar is needed, you can enable TWCS as any 
other default compaction strategy.  
  C*heers,  --- Alain Rodriguez - @arodream - 
al...@thelastpickle.com France 
  The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com
 
 2017-01-31 23:29 GMT+01:00 Cogumelos Maravilha :
 
  Hi Alain, Thanks for your response and the links. I've also checked "Time 
series data model and tombstones".  Is it safe to use TWCS in 

Re: Is this normal!?

2017-01-12 Thread Romain Hardouin
Just a side note: increase system_auth keyspace replication factor if you're 
using authentication. 

Le Jeudi 12 janvier 2017 14h52, Alain RODRIGUEZ  a 
écrit :
 

  Hi,

Nodetool repair always list lots of data and never stays repaired. I think.


This might be the reason:


"incremental: true"

Incremental repairs is the default in your version. It marks data as being 
repaired in order to only repair each data only once. It is a clever feature, 
but with some caveats. I would read about it as it is not trivial to understand 
impacts and in some cases it can create issues and not be such a good idea to 
use incremental repairs. Make sure to run a full repair instead when a node 
goes down for example.
C*heers,---Alain Rodriguez - @arodream - 
alain@thelastpickle.comFrance
The Last Pickle - Apache Cassandra Consultinghttp://www.thelastpickle.com
 
2017-01-11 15:21 GMT+01:00 Cogumelos Maravilha :

Nodetool repair always list lots of data and never stays repaired. I think.

Cheers


On 01/11/2017 02:15 PM, Hannu Kröger wrote:
> Just to understand:
>
> What exactly is the problem?
>
> Cheers,
> Hannu
>
>> On 11 Jan 2017, at 16.07, Cogumelos Maravilha  
>> wrote:
>>
>> Cassandra 3.9.
>>
>> nodetool status
>> Datacenter: dc1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/ Moving
>> --  Address       Load       Tokens       Owns (effective)  Host
>> ID                               Rack
>> UN  10.0.120.145  1.21 MiB   256          49.5%
>> da6683cd-c3cf-4c14-b3cc- e7af4080c24f  rack1
>> UN  10.0.120.179  1020.51 KiB  256          48.1%
>> fb695bea-d5e8-4bde-99db- 9f756456a035  rack1
>> UN  10.0.120.55   1.02 MiB   256          53.3%
>> eb911989-3555-4aef-b11c- 4a684a89a8c4  rack1
>> UN  10.0.120.46   1.01 MiB   256          49.1%
>> 8034c30a-c1bc-44d4-bf84- 36742e0ec21c  rack1
>>
>> nodetool repair
>> [2017-01-11 13:58:27,274] Replication factor is 1. No repair is needed
>> for keyspace 'system_auth'
>> [2017-01-11 13:58:27,284] Starting repair command #4, repairing keyspace
>> system_traces with repair options (parallelism: parallel, primary range:
>> false, incremental: true, job threads: 1, ColumnFamilies: [],
>> dataCenters: [], hosts: [], # of ranges: 515)
>> [2017-01-11 14:01:55,628] Repair session
>> 82a25960-d806-11e6-8ac4- 73b93fe4986d for range
>> [(-1278992819359672027,- 1209509957304098060],
>> (-2593749995021251600,- 2592266543457887959],
>> (-6451044457481580778,- 6438233936014720969],
>> (-1917989291840804877,- 1912580903456869648],
>> (-3693090304802198257,- 3681923561719364766],
>> (-380426998894740867,- 350094836653869552],
>> (1890591246410309420, 1899294587910578387],
>> (6561031217224224632, 6580230317350171440],
>> ... 4 pages of data
>> , (6033828815719998292, 6079920177089043443]] finished (progress: 1%)
>> [2017-01-11 13:58:27,986] Repair completed successfully
>> [2017-01-11 13:58:27,988] Repair command #4 finished in 0 seconds
>>
>> nodetool gcstats
>> Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed
>> (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
>>            360134                  23
>> 23                   0           333975216
>> 1                       -1
>>
>> (wait)
>> nodetool gcstats
>> Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed
>> (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
>>           60016                   0                   0
>> NaN                   0                   0                       -1
>>
>> nodetool repair
>> [2017-01-11 14:00:45,888] Replication factor is 1. No repair is needed
>> for keyspace 'system_auth'
>> [2017-01-11 14:00:45,896] Starting repair command #5, repairing keyspace
>> system_traces with repair options (parallelism: parallel, primary range:
>> false, incremental: true, job threads: 1, ColumnFamilies: [],
>> dataCenters: [], hosts: [], # of ranges: 515)
>> ... 4 pages of data
>> , (94613607632078948, 219237792837906432],
>> (6033828815719998292, 6079920177089043443]] finished (progress: 1%)
>> [2017-01-11 14:00:46,567] Repair completed successfully
>> [2017-01-11 14:00:46,576] Repair command #5 finished in 0 seconds
>>
>> nodetool gcstats
>> Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed
>> (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
>>       9169                  25                  25
>> 0           330518688                   1                       -1
>>
>>
>> Always in loop, I think!
>>
>> Thanks in advance.
>>





   

Re: Openstack and Cassandra

2016-12-28 Thread Romain Hardouin
Kilo is a bit old but the good news is that CPU pinning is available which IMHO 
is a must to run C* on Production.Of course your bottleneck will be shared HDDs.
Best,
Romain 

Le Mardi 27 décembre 2016 10h21, Shalom Sagges  a 
écrit :
 

 Hi Romain, 
Thanks for the input!
We currently use the Kilo release of Openstack. Are you aware of any known 
bugs/issues with this release?We definitely defined anti-affinity rules 
regarding spreading C* on different hosts. (I surely don't want to be woken up 
at night due to a failed host ;-) )
Regarding Trove, I doubt we'll use it in Production any time soon.
Thanks again!



 
|  |
| Shalom Sagges |
| DBA |
| T: +972-74-700-4035 |
| 
| 
|  |  |  |

 | We Create Meaningful Connections |

 |
|  |

 
On Mon, Dec 26, 2016 at 7:37 PM, Romain Hardouin  wrote:

Hi Shalom,
I assume you'll use KVM virtualization so pay attention to your stack at every 
level:- Nova e.g. CPU pinning, NUMA awareness if relevant, etc. Have a look to 
extra specs.- libvirt - KVM- QEMU
You can also be interested by resources quota on other OpenStack VMs that will 
be colocated with C* VMs.Don't forget to define anti-affinity rules in order to 
spread out your C* VMs on different hosts.Finally, watch out versions of 
libvirt/KVM/QEMU. Some optimizations/bugs are good to know.
Out of curiosity, which OpenStack release are you using?You can be interested 
by Trove but C* support is for testing only.
Best,
Romain



   


This message may contain confidential and/or privileged information. If you are 
not the addressee or authorized to receive this on behalf of the addressee you 
must not use, copy, disclose or take action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply email and delete this message. Thank you.

   

Re: Openstack and Cassandra

2016-12-26 Thread Romain Hardouin
Hi Shalom,
I assume you'll use KVM virtualization so pay attention to your stack at every 
level:- Nova e.g. CPU pinning, NUMA awareness if relevant, etc. Have a look to 
extra specs.- libvirt - KVM- QEMU
You can also be interested by resources quota on other OpenStack VMs that will 
be colocated with C* VMs.Don't forget to define anti-affinity rules in order to 
spread out your C* VMs on different hosts.Finally, watch out versions of 
libvirt/KVM/QEMU. Some optimizations/bugs are good to know.
Out of curiosity, which OpenStack release are you using?You can be interested 
by Trove but C* support is for testing only.
Best,
Romain



   

Repair: huge boost on C* 2.1 with CASSANDRA-12580

2016-10-14 Thread Romain Hardouin
Hi all,

Many people here have troubles with repair so I would like to share my 
experience regarding the backport of CASSANDRA-12580 "Fix merkle tree size 
calculation" (thanks Paulo!) in our C* 2.1.16. I was expecting some minor 
improvements but the results are impressive on some tables.

Because of a slow VPN between our EU and US AWS DCs, the massive drop of 
overstreaming is a big win for us. On top of that, before the backport I used 
to see many RepairException that increased during each repair. With this fix 
the graph shows only one exception on one node, so we can say it's negligible. 
Such exceptions are not critical because Cassandra-reaper makes a retry but 
it's a waste of time.


I run a repair on tables set by set (some sets of tables being more critical, 
etc.).
The most impressive result so far for a set is:
* Before: 23 days (days, not hours)
* With CASSANDRA-12580: 16 hours (yes, hours!)

The improvement is not always dramatic (e.g. 8 hours instead of 39 hours on 
another set) but still significant and valuable.

Moreover, considering that:
* repair is a mandatory operation in many use cases
* Paulo already made the patch for 2.1
* C* 2.1 is widely used (the most used?)
I think this bugfix is critical - from an Ops point of view - and should land 
in 2.1.17 to be available to people that don't deploy from sources.

Best,

Romain


Re: cassandra dump file path

2016-10-14 Thread Romain Hardouin
Hi Jean,
I had the same problem, I removed the lines in /etc/init.d/cassandra template 
(we use Chef to deploy) and now the HeapDumpPath is not overridden anymore.The 
same goes for -XX:ErrorFile. 
Best,
Romain

Le Mardi 4 octobre 2016 9h25, Jean Carlo  a 
écrit :
 

 Yes, we did it. 

So if the parameter in cassandra-env.sh is used only if we have a OOM, what is 
for the definition of -XX:HeapDumpPath=/var/lib/ cassandra/java_1475461286. 
hprof in /etc/init.d/cassandra for?


Saludos

Jean Carlo
"The best way to predict the future is to invent it" Alan Kay

On Tue, Oct 4, 2016 at 2:58 AM, Yabin Meng  wrote:

Have you restarted Cassandra after making changes in cassandra-env.sh?
Yabin
On Mon, Oct 3, 2016 at 7:44 AM, Jean Carlo  wrote:

OK I got the response to one of my questions. In the script 
/etc/init.d/cassandra we set the path for the heap dump by default in the 
cassandra_home.

Now the thing I don't understand is, why do the dumps are located by the file 
set by /etc/init.d/cassandra and not by the  conf file cassandra-env.sh?

Anyone any idea?


Saludos

Jean Carlo
"The best way to predict the future is to invent it" Alan Kay

On Mon, Oct 3, 2016 at 12:00 PM, Jean Carlo  wrote:


Hi

I see in the log of my node cassandra that the parameter -XX:HeapDumpPath is 
charged two times.

INFO  [main] 2016-10-03 04:21:29,941 CassandraDaemon.java:205 - JVM Arguments: 
[-ea, -javaagent:/usr/share/cassandr a/lib/jamm-0.3.0.jar, 
-XX:+CMSClassUnloadingEnabled, -XX:+UseThreadPriorities, 
-XX:ThreadPriorityPolicy=42, -Xms6G, -Xmx6G, -Xmn600M, 
-XX:+HeapDumpOnOutOfMemoryErro r, -XX:HeapDumpPath=/cassandra/du 
mps/cassandra-1475461287-pid34 435.hprof, -Xss256k, 
-XX:StringTableSize=103, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, 
-XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, 
-XX:CMSInitiatingOccupancyFrac tion=30, -XX:+UseCMSInitiatingOccupancy Only, 
-XX:+UseTLAB, -XX:CompileCommandFile=/etc/ca ssandra/hotspot_compiler, 
-XX:CMSWaitDuration=1, -XX:+CMSParallelInitialMarkEna bled, 
-XX:+CMSEdenChunksRecordAlways , -XX:CMSWaitDuration=1, 
-XX:+UseCondCardMark, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, 
-XX:+PrintGCApplicationStopped Time, -Xloggc:/var/opt/hosting/log/c 
assandra/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=20, 
-XX:GCLogFileSize=20M, -Djava.net.preferIPv4Stack=tru e, 
-Dcom.sun.management.jmxremote .port=7199, -Dcom.sun.management.jmxremote 
.rmi.port=7199, -Dcom.sun.management.jmxremote .ssl=false, 
-Dcom.sun.management.jmxremote .authenticate=false, 
-Dcom.sun.management.jmxremote .password.file=/etc/cassandra/ 
jmxremote.password, -Djava.io.tmpdir=/var/opt/host ing/db/cassandra/tmp, 
-javaagent:/usr/share/cassandr a/lib/jolokia-jvm-1.0.6-agent. 
jar=port=8778,host=0.0.0.0, -Dcassandra.auth_bcrypt_gensal t_log2_rounds=4, 
-Dlogback.configurationFile=lo gback.xml, -Dcassandra.logdir=/var/log/ca 
ssandra, -Dcassandra.storagedir=, -Dcassandra-pidfile=/var/run/c 
assandra/cassandra.pid, -XX:HeapDumpPath=/var/lib/cass 
andra/java_1475461286.hprof, -XX:ErrorFile=/var/lib/cassand 
ra/hs_err_1475461286.log]

This option is defined in cassandra-env.sh

if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
    JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HE 
APDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
fi
 and we defined before the value of CASSANDRA_HEAPDUMP_DIR before to 
/cassandra/dumps/

It is seems that cassandra does not care about the conf in cassandra-env.sh and 
he only takes in account the last set for HeapDumpPath 
/var/lib/cassandra/java_147546 1286.hprof

This causes problems when we have to dump the heap because cassandra uses the 
disk not suitable to do it.

Is  XX:HeapDumpPath set in another place/file that I dont know?

Thxs

Jean Carlo
"The best way to predict the future is to invent it" Alan Kay








   

Re: TRUNCATE throws OperationTimedOut randomly

2016-09-29 Thread Romain Hardouin
Hi,
@Edward > In older versions you can not control when this call will 
timeout,truncate_request_timeout_in_ms is available for many years, starting 
from 1.2. Maybe you have another setting parameter in mind?
@GeorgeTry to put cassandra logs in debug
Best,
Romain
 

Le Mercredi 28 septembre 2016 20h31, George Sigletos 
 a écrit :
 

 Even when I set a lower request-timeout in order to trigger a timeout, still 
no WARN or ERROR in the logs

On Wed, Sep 28, 2016 at 8:22 PM, George Sigletos  wrote:

Hi Joaquin,

Unfortunately neither WARN nor ERROR found in the system logs across the 
cluster when executing truncate. Sometimes it executes immediately, other times 
it takes 25 seconds, given that I have connected with --request-timeout=30 
seconds. 

The nodes are a bit busy compacting. On a freshly restarted cluster, truncate 
seems to work without problems.

Some warnings that I see around that time but not exactly when executing 
truncate are:
WARN  [CompactionExecutor:2] 2016-09-28 20:03:29,646 SSTableWriter.java:241 - 
Compacting large partition system/hints:6f2c3b31-4975- 470b-8f91-e706be89a83a 
(133819308 bytes

Kind regards,
George

On Wed, Sep 28, 2016 at 7:54 PM, Joaquin Casares  
wrote:

Hi George,
Try grepping for WARN and ERROR on the system.logs across all nodes when you 
run the command. Could you post any of the recent stacktraces that you see?
Cheers,
Joaquin Casares
ConsultantAustin, TX
Apache Cassandra Consultinghttp://www.thelastpickle.com
On Wed, Sep 28, 2016 at 12:43 PM, George Sigletos  
wrote:

Thanks a lot for your reply.

I understand that truncate is an expensive operation. But throwing a timeout 
while truncating a table that is already empty?

A workaround is to set a high --request-timeout when connecting. Even 20 
seconds is not always enough

Kind regards,
George


On Wed, Sep 28, 2016 at 6:59 PM, Edward Capriolo  wrote:

Truncate does a few things (based on version) 
  truncate takes snapshots  truncate causes a flush
  in very old versions truncate causes a schema migration.

In newer versions like cassandra 3.4 you have this knob.

# How long the coordinator should wait for truncates to complete# (This can be 
much longer, because unless auto_snapshot is disabled# we need to flush first 
so we can snapshot before removing the data.)truncate_request_timeout_in_ms : 
6

In older versions you can not control when this call will timeout, it is fairly 
normal that it does!

On Wed, Sep 28, 2016 at 12:50 PM, George Sigletos  
wrote:

Hello,

I keep executing a TRUNCATE command on an empty table and it throws 
OperationTimedOut randomly:

cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01
cassandra@cqlsh> truncate test.mytable;
OperationTimedOut: errors={}, last_host=cassiebeta-01

Having a 3 node cluster running 2.1.14. No connectivity problems. Has anybody 
come across the same error?

Thanks,
George













   

Re: Optimising the data model for reads

2016-09-29 Thread Romain Hardouin
Hi Julian,
The problem with any deletes here is that you can *read* potentially many 
tombstones. I mean you have two concerns: 1. Avoid to read tombstones during a 
query 2. How to evict tombstones as quickly as possible to reclaim disk space   
 The first point is a data model consideration. Generally speaking, to avoid to 
read tombstones we have to think about order. Let's take an example not related 
to your data model: say you have a "updated_at" column, maybe you always want 
to read the newest data (e.g. < 7 days) while oldest ones will be TTL'ed 
(tombstones). If you order your data by "updated_at DESC" (and TTL>7 days and 
there are no manual deletes) you won't read tombstones.
The second point depends on many factors: gc_grace, compaction strategy, 
compaction throughput, number of compactors, IO performances, #CPUs, ...    
Also, with such a data model, you will have unbalance data distribution. What 
if a user has 1,000,000 files or more?You can use a composite partition key to 
avoid that: PRIMARY KEY ((userid, fileid), ...).The data distribution will be 
much better and on top of that you won't read tombstones when a file is deleted 
(because you won't query the partition key at all). *However if you always read 
many files per user, each query will hit many nodes.*You have to decide 
depending on the query pattern, the average/max number of files per user, the 
average/max file size, etc.
Regarding the compaction strategy, LCS is good for read heavy workload but you 
need good disk IO and enough CPUs/vCPUs (watch out if your write workload is 
quite heavy).The LCS will compact frequently so, *if tombstones are evictable*, 
they will be evicted faster that with STCS.As you mentioned, you have 10 days 
of gc_grace so you might consider to lower this value if maintenance repair are 
running in few hours/days.
LCS is doing a good job with updates and that gives me an idea: what about soft 
deletes? A clustering column "status int" could do the trick. Let's say 
1=>"live file", 2=>"to delete".When a user deletes a file, you set the "status" 
to 2 and write the userid and fileid in a table "files_to_delete" (the 
partition key can be the date of the day if there are not millions of deletion 
per day). Then a batch job can run during off-peak hours to delete i.e. add a 
tombstone on files to delete.In read queries you would have to add "WHERE 
status = 1 AND ...". Again it's just an idea that crosses my mind, I never 
tested this model, but maybe you can think about it. The bonus is that you can 
"undeleted" a file as long as the batch job has not been triggered.
Best,
Romain 

Le Jeudi 29 septembre 2016 11h31, Thomas Julian  a 
écrit :
 

 Hello,

I have created a column family for User File Management.
CREATE TABLE "UserFile" ("USERID" bigint,"FILEID" text,"FILETYPE" 
int,"FOLDER_UID" text,"FILEPATHINFO" text,"JSONCOLUMN" text,PRIMARY KEY 
("USERID","FILEID"));

Sample Entry

(4*003, 3f9**6a1, null, 2 , 
[{"FOLDER_TYPE":"-1","UID":"1","FOLDER":"\"HOME\""}] 
,{"filename":"untitled","size":1,"kind":-1,"where":""})


Queries :

Select "USERID","FILEID","FILETYPE","FOLDER_UID","JSONCOLUMN" from "UserFile" 
where "USERID"= and "FILEID" in (,,...)

Select "USERID","FILEID","FILEPATHINFO" from "UserFile" where "USERID"= 
and "FILEID" in (,,...) 

This column family was perfectly working in our lab. I was able to fetch the 
results for the queries stated at less than 10ms. I deployed this in 
production(Cassandra 2.1.13), It was working perfectly for a month or two. But 
now at times the queries are taking 5s to 10s. On analysing further, I found 
that few users are deleting the files too frequently. This generates too many 
tombstones. I have set the gc_grace_seconds to the default 10 days and I have 
chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for 
read efficiency. 

Any help is much appreciated.

Best Regards,
Julian.




   

Re: Nodetool repair

2016-09-23 Thread Romain Hardouin
OK. If you still have issues after setting streaming_socket_timeout_in_ms != 0, 
consider increasing request_timeout_in_ms to a high value, say 1 or 2 minutes. 
See comments in https://issues.apache.org/jira/browse/CASSANDRA-7904Regarding 
2.1, be sure to test incremental repair on your data before to run it in 
production ;-)
Romain

Re: Nodetool repair

2016-09-22 Thread Romain Hardouin
Alain, you replied faster, I didn't see your answer :-D

Re: Nodetool repair

2016-09-22 Thread Romain Hardouin
Hi,
@Matija: George wrote that he uses C* 2.0.9, so the Spotify master is OK for 
him :-) But you're right about C* >= 2.1, we also use a fork to run it against 
our 2.1 clusters.
@George: your repair might be slow and not necessarily stuck.  As Alain said, 
check the progression of nodetool netstats.Did you set 
streaming_socket_timeout_in_ms to a value different than 0?What is the value of 
request_timeout_in_ms?Also I suggest you to upgrade to the last 2.0.x (i.e. 
2.0.17). No need to upgrade SSTables but be sure to read 
https://github.com/apache/cassandra/blob/cassandra-2.0/NEWS.txtAgain, you 
should have a look at cassandra-reaper and the GUI, you will have a progress 
bar to follow the repair.
Finally if you want to kill a repair you can invoke 
forceTerminateAllRepairSessions with jmxterm on each node:1. nodetool stop 
VALIDATION2. echo run -b org.apache.cassandra.db:type=StorageService 
forceTerminateAllRepairSessions | java -jar 
/tmp/jmxterm/jmxterm-1.0-alpha-4-uber.jar -l 127.0.0.1:7199
jmxterm download: 
http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
Best,
Romain

Le Jeudi 22 septembre 2016 16h45, "Li, Guangxing"  a 
écrit :



Romain,

I had another repair that seems to just hang last night. When I did 'nodetool 
tpstats' on nodes, I see the following in the node where I initiated the repair:
AntiEntropySessions               1         1
On all other nodes, I see:
AntiEntropySessions               0         0

When I check the log for pattern "session completed successfully" in 
system.log, I see the last finished range occurred in 14 hours ago. So I think 
it is safe to say that the repair has hanged somehow. In order to start another 
repair, do we need to 'kill' this repair. If so, how do we do that?

Thanks.

George.


On Thu, Sep 22, 2016 at 6:23 AM, Romain Hardouin  wrote:

I meant that pending (and active) AntiEntropySessions are a simple way to check 
if a repair is still running on a cluster. Also have a look at Cassandra reaper:
>- https://github.com/spotify/ cassandra-reaper
>
>- https://github.com/ spodkowinski/cassandra-reaper- ui
>
>Best,
>Romain
>
>
>
>
>Le Mercredi 21 septembre 2016 22h32, "Li, Guangxing" 
> a écrit :
>
>Romain,
>
>I started running a new repair. If I see such behavior again, I will try what 
>you mentioned.
>
>Thanks.
>

Re: Nodetool repair

2016-09-22 Thread Romain Hardouin
I meant that pending (and active) AntiEntropySessions are a simple way to check 
if a repair is still running on a cluster. Also have a look at Cassandra reaper:
- https://github.com/spotify/cassandra-reaper

- https://github.com/spodkowinski/cassandra-reaper-ui

Best,
Romain



Le Mercredi 21 septembre 2016 22h32, "Li, Guangxing"  
a écrit :

Romain,

I started running a new repair. If I see such behavior again, I will try what 
you mentioned.

Thanks.


Re: Nodetool repair

2016-09-21 Thread Romain Hardouin
Do you see any pending AntiEntropySessions (not AntiEntropyStage) with nodetool 
tpstats on nodes?
Romain
 

Le Mercredi 21 septembre 2016 16h45, "Li, Guangxing" 
 a écrit :
 

 Alain,
my script actually grep through all the log files, including those 
system.log.*. So it was probably due to a failed session. So now my script 
assumes the repair has finished (possibly due to failure) if it does not see 
any more repair related logs after 2 hours.
Thanks.
George.
On Wed, Sep 21, 2016 at 3:03 AM, Alain RODRIGUEZ  wrote:

Hi George,
That's the best way to monitor repairs "out of the box" I could think of. When 
you're not seeing 2048 (in your case), it might be due to log rotation or to a 
session failure. Have you had a look at repair failures?

I am wondering why the implementor did not put something in the log (e.g. ... 
Repair command #41 has ended...) to clearly state that the repair has completed.

+1, and some informations about ranges successfully repaired and the ranges 
that failed could be a very good thing as well. It would be easy to then read 
the repair result and to know what to do next (re-run repair on some ranges, 
move to the next node, etc).

2016-09-20 17:00 GMT+02:00 Li, Guangxing :

Hi,
I am using version 2.0.9. I have been looking into the logs to see if a repair 
is finished. Each time a repair is started on a node, I am seeing log line like 
"INFO [Thread-112920] 2016-09-16 19:00:43,805 StorageService.java (line 2646) 
Starting repair command #41, repairing 2048 ranges for keyspace groupmanager" 
in system.log. So I know that I am expecting to see 2048 log lines like "INFO 
[AntiEntropySessions:109] 2016-09-16 19:27:20,662 RepairSession.java (line 282) 
[repair #8b910950-7c43-11e6-88f3-f147e a74230b] session completed 
successfully". Once I see 2048 such log lines, I know this repair has 
completed. But this is not dependable since sometimes I am seeing less than 
2048 but I know there is no repair going on since I do not see any trace of 
repair in system.log for a long time. So it seems to me that there is a clear 
way to tell that a repair has started but there is no clear way to tell a 
repair has ended. The only thing you can do is to watch the log and if you do 
not see repair activity for a long time, the repair is done somehow. I am 
wondering why the implementor did not put something in the log (e.g. ... Repair 
command #41 has ended...) to clearly state that the repair has completed.
Thanks.
George.
On Tue, Sep 20, 2016 at 2:54 AM, Jens Rantil  wrote:

On Mon, Sep 19, 2016 at 3:07 PM Alain RODRIGUEZ  wrote:

...
- The size of your data- The number of vnodes- The compaction throughput- The 
streaming throughput- The hardware available- The load of the cluster- ...

I've also heard that the number of clustering keys per partition key could have 
an impact. Might be worth investigating.
Cheers,Jens -- 
Jens Rantil
Backend Developer @ TinkTink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.







   

Re: High load on few nodes in a DC.

2016-09-21 Thread Romain Hardouin
Hi,
Do you shuffle the replicas with 
TokenAwarePolicy?TokenAwarePolicy(LoadBalancingPolicy childPolicy, boolean 
shuffleReplicas) 

Best,
RomainLe Mardi 20 septembre 2016 15h47, Pranay akula 
 a écrit :
 

 I was a able to find the hotspots causing the load,but the size of these 
partitions are in KB and no tombstones and no.of sstables is only 2 what else i 
need to debug to find the reason for high load for some nodes.  we are also 
using unlogged batches is that can be the reason ?? how to find which node is 
serving as a coordinator for un logged batches?? we are using token-aware 
policy.
thanks


On Mon, Sep 19, 2016 at 12:29 PM, Pranay akula  
wrote:

I was able to see most used partitions but the nodes with less load are serving 
more read and write requests for that particular partitions when compared to 
nodes with high load, how can i find if these nodes are serving as 
co-coordinators for those read and write requests ?? how can i find the token 
range for these particular partitions and which node is the primary for these 
partition ??

Thanks
On Mon, Sep 19, 2016 at 11:04 AM, Pranay akula  
wrote:

Hai Jeff,
Thank, we are using RF 3 and cassandra version 2.1.8.
ThanksPranay.
On Mon, Sep 19, 2016 at 10:55 AM, Jeff Jirsa  wrote:

Is your replication_factor 2? Or is it 3?  What version are you using?  The 
most likely answer is some individual partition that’s either being 
written/read more than others, or is somehow impacting the cluster (wide rows 
are a natural candidate). You don’t mention your version, but most modern 
versions of Cassandra ship with ‘nodetool toppartitions’, which will help you 
identify frequently written/read partitions – perhaps you can use that to 
identify a hotspot due to some external behavior (some partition being read 
thousands of times, over and over could certainly drive up load). -  
Jeff From: Pranay akula 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 7:53 AM
To: "user@cassandra.apache.org" 
Subject: High load on few nodes in a DC. when our cluster was under load  i am 
seeing  1 or 2 nodes are on more load consistently when compared to others in 
dc i am not seeing any GC pauses or wide partitions  is this can be those nodes 
are continuously serving as coordinators ?? how can  i find what is the reason 
for high load on those two nodes ?? We are using Vnode.   ThanksPranay. 







   

Re: Export/Importing keyspace from a different sized cluster

2016-09-20 Thread Romain Hardouin
Also for testing purposes, you can send only one replica set to the Test DC. 
For instance with a RF=3 and 3 C* racks, you can just rsync/sstableload one 
rack. It will be faster and OK for tests.
Best,
Romain 

Le Mardi 20 septembre 2016 3h28, Michael Laws  a 
écrit :
 

 I put together a shell 
wrapper around nodetool/sstableloader that I’ve been running for the past few 
years – https://github.com/AppliedInfrastructure/cassandra-snapshot-toolsAlways 
seemed to work well for these kinds of scenarios…  Never really had to think 
about where SSTables were on the filesystem, etc. Mike From: Justin Sanciangco 
[mailto:jsancian...@blizzard.com] 
Sent: Monday, September 19, 2016 6:20 PM
To: user@cassandra.apache.org
Subject: RE: Export/Importing keyspace from a different sized cluster I am 
running cqlsh 5.0.1 | Cassandra 2.1.11.969 | DSE 4.8.3 | CQL spec 3.2.1 |  
Doing the below command seemed to worksstableloader -d  
 Thanks for the help!  From: Jeff Jirsa 
[mailto:jeff.ji...@crowdstrike.com] 
Sent: Monday, September 19, 2016 5:49 PM
To: user@cassandra.apache.org
Subject: Re: Export/Importing keyspace from a different sized cluster Something 
like that, depending on your version (which you didn’t specify). Note, though, 
that sstableloader is notoriously picky about the path to sstables. In 
particular, it really really really wants a directory structure that matches 
the directory structure on disk, and wants you to be at the equivalent of the 
parent/data_files_directory (so if you dump your sstables at 
/path/to/data/keyspace/table/, you’d want to run sstableloader from 
/path/to/data/ and provide keyspace/table/ as the location).   From: Justin 
Sanciangco 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 5:44 PM
To: "user@cassandra.apache.org" 
Subject: RE: Export/Importing keyspace from a different sized cluster So if I 
rsync the the sstables say from source node 1 and source node 2 to target node 
1. Would I just run the command like this? From target hostsstableloader -d 
  From: Jeff Jirsa 
[mailto:jeff.ji...@crowdstrike.com] 
Sent: Monday, September 19, 2016 4:45 PM
To: user@cassandra.apache.org
Subject: Re: Export/Importing keyspace from a different sized cluster You can 
ship the sstables to the destination (or any other server with Cassandra binary 
tools installed) via ssh/rsync and run sstableloader on the destination cluster 
as well.  From: Justin Sanciangco 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, September 19, 2016 at 2:49 PM
To: "user@cassandra.apache.org" 
Subject: Export/Importing keyspace from a different sized cluster Hello, 
Assuming I can’t get ports opened from source to target cluster to run 
sstableloader, what methods can I use to load a single keyspace from one 
cluster to another cluster of different size?  Appreciate the help… 
Thanks,Justin 

   

Re: Optimal value for concurrent_reads for a single NVMe Disk

2016-09-20 Thread Romain Hardouin
Hi,
You should make a benchmark with cassandra-stress to find the sweet spot. With 
NVMe I guess you can start with a high value, 128?
Please let us know the results of your findings, it's interesting to know if we 
can go crazy with such pieces of hardware :-)
Best,
Romain 

Le Mardi 20 septembre 2016 12h11, Thomas Julian  a 
écrit :
 

 Hello,

We are using Cassandra 2.1.13 with each node having a NVMe disk with the 
configuration of Total Capacity - 1.2TB, Alloted Capacity -  880GB. We would 
like to increase the default value of 32 for the param concurrent_reads. But 
the document says 

"(Default: 32)note For workloads with more data than can fit in memory, the 
bottleneck is reads fetching data from disk. Setting to (16 × number_of_drives) 
allows operations to queue low enough in the stack so that the OS and drives 
can reorder them. The default setting applies to both logical volume managed 
(LVM) and RAID drives."

https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__concurrent_reads

According to this hardware specification, what could be the optimal value that 
can be set for concurrent_reads?

Best Regards,
Julian.







   

Re: How to alter the default value for concurrent_compactors

2016-09-20 Thread Romain Hardouin
Hi,
You can read and write the value of the following MBean via 
JMX:org.apache.cassandra.db:type=CompactionManager  - CoreCompactorThreads
 - MaximumCompactorThreads

If you modify CoreCompactorThreads it will be effective immediatly, I mean 
assuming you have some pending compactions, you will see N lines with nodetool 
compactionstats where N=.
Best,
Romain
 

Le Mardi 20 septembre 2016 13h50, Thomas Julian  a 
écrit :
 

 Hello,

We have commented out "concurrent_compactors" in our Cassandra 2.1.13 
installation. 
We would like to review this setting, as some issues indicate that the default 
configuration may affect read/write performance. 

https://issues.apache.org/jira/browse/CASSANDRA-8787
https://issues.apache.org/jira/browse/CASSANDRA-7139

Where can we see the value set for concurrent_compactors in our setup? Is it 
possible to update this configuration without a restart?

Best Regards,
Julian.




   

Re: large system hint partition

2016-09-20 Thread Romain Hardouin
Hi,
> More recent (I think 2.2) don't have this problem since they write hints to 
>the file system as per the commit log
Flat files hints were implemented starting from 3.0  
https://issues.apache.org/jira/browse/CASSANDRA-6230
Best,
Romain

Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

2016-09-09 Thread Romain Hardouin
Note that LZ4 compression is used by default. If you want to disable 
compression you can do this:CREATE/ALTER TABLE ... WITH compression = { 
'sstable_compression' : '' };
Best,
Romain
 

Le Vendredi 9 septembre 2016 8h12, Alexandr Porunov 
 a écrit :
 

 Hello Romain,
Thank you very much for the explanation!
I have just run a simple test to compare both situations.I have run two VM 
equivalent machines.Machine 1:CREATE KEYSPACE "test" WITH REPLICATION = { 
'class' : 'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE test.simple (  id bigint PRIMARY KEY);
Machine 2:CREATE KEYSPACE "test" WITH REPLICATION = { 'class' : 
'SimpleStrategy', 'replication_factor' : 1 };
CREATE TABLE test.simple (  id blob PRIMARY KEY);
And have put 13421772 primary keys from 1 to 13421772 in both machines.
Results:Machine 1: size of the data folder: 495864 bytesMachine 2: size of the 
data folder: 495004 bytes
So here is almost no any difference between them (even happened with blob 
storage cost 1 MB less).
I am happy about it because I need to store special encoded primary keys with 
80 bits each. So I can use blob as a primary key without hesitation.
Best regards,Alexandr
On Fri, Sep 9, 2016 at 1:20 AM, Romain Hardouin  wrote:

Hi,
Disk-wise it's the same because a bigint is serialized as a 8 bytes ByteBuffer 
and if you want to store a Long as bytes into a blob type it will take 8 bytes 
too, right?The difference is the validation. The blob ByteBuffer will be stored 
as is whereas the bigint will be validated. So technically the Long is slower, 
but I guess that's not noticeable.
Yes you can use a blob as a partition key. I would use the bigint both for 
validation and clarity. 
Best,
Romain 

Le Mercredi 7 septembre 2016 22h54, Alexandr Porunov 
 a écrit :
 

 Hello,

I need to store a "Long" Java variable.The question is: whether the storage 
cost is the same both for store hex representation of "Long" variable to the 
blob and for store "Long" variable to the bigint?Are there any performance pros 
or cons?Is it OK to use blob as primary key?
Sincerely,Alexandr

   



   

Re: Is a blob storage cost of cassandra is the same as bigint storage cost for long variables?

2016-09-08 Thread Romain Hardouin
Hi,
Disk-wise it's the same because a bigint is serialized as a 8 bytes ByteBuffer 
and if you want to store a Long as bytes into a blob type it will take 8 bytes 
too, right?The difference is the validation. The blob ByteBuffer will be stored 
as is whereas the bigint will be validated. So technically the Long is slower, 
but I guess that's not noticeable.
Yes you can use a blob as a partition key. I would use the bigint both for 
validation and clarity. 
Best,
Romain 

Le Mercredi 7 septembre 2016 22h54, Alexandr Porunov 
 a écrit :
 

 Hello,

I need to store a "Long" Java variable.The question is: whether the storage 
cost is the same both for store hex representation of "Long" variable to the 
blob and for store "Long" variable to the bigint?Are there any performance pros 
or cons?Is it OK to use blob as primary key?
Sincerely,Alexandr

   

Re: Read timeouts on primary key queries

2016-09-07 Thread Romain Hardouin
 Is it still fast if you specify CONSISTENCY LOCAL_QUORUM in cqlsh?
RomainLe Mercredi 7 septembre 2016 13h56, Joseph Tech 
 a écrit :
 

 Thanks, Romain for the detailed explanation. We use log4j 2 and i have added 
the driver logging for slow/error queries, will see if it helps to provide any 
pattern once in Prod. 
I tried getendpoints and getsstables for some of the timed out keys and most of 
them listed only 1 SSTable .There were a few which showed 2 SSTables. There is 
no specific trend on the keys, it's completely based on the user access, and 
the same keys return results instantly from cqlsh


On Tue, Sep 6, 2016 at 1:57 PM, Romain Hardouin  wrote:

There is nothing special in the two sstablemetadata outuputs but if the 
timeouts are due to a network split or overwhelmed node or something like that 
you won't see anything here. That said, if you have the keys which produced the 
timeouts then, yes, you can look for a regular pattern (i.e. always the same 
keys?). 

You can find sstables for a given key with nodetool:    nodetool getendpoints 
  Then you can run the following command on one/each node of 
the enpoints:    nodetool getsstables   
If many sstables are shown in the previous command it means that your data is 
fragmented but thanks to LCS this number should be low.
I think the most usefull actions now would be:
 1) Enable DEBUG for o.a.c.db.ConsistencyLevel, it won't spam your log file, 
you will see the following when errors will occur:    - Local replicas 
[, ...] are insufficient to satisfy LOCAL_QUORUM requirement of X 
live nodes in ''
    You are using C* 2.1 but you can have a look at the C* 2.2 logback.xml: 
https://github. com/apache/cassandra/blob/ cassandra-2.2/conf/logback.xml    
I'm using it on production, it's better because it creates a separate debug.log 
file with a asynchronous appender.
   Watch out when enabling:              
Because the default logback configuration set all o.a.c in DEBUG:           
       Instead you can set:  
         
    Also, if you want to restrict debug.log to DEBUG level only (instead of 
DEBUG+INFO+...) you can add a LevelFilter to ASYNCDEBUGLOG in logback.xml:      
            
DEBUG      ACCEPT      
DENY    
  Thus, the debug.log file will be empty unless some Consistency issues happen. 
   2) Enable slow queries log at the driver level with a QueryLogger:    
Cluster cluster = ...   // log queries longer than 1 second, see also 
withDynamicThreshold   QueryLogger queryLogger = QueryLogger.builder(cluster). 
withConstantThreshold(1000). build();   cluster.register(queryLogger) ;        
Then in your driver logback file:              3) And/or: you mentioned that you use 
DSE so you can enable slow queries logging in dse.yaml (cql_slow_log_options)
Best,
Romain   

 Le Lundi 5 septembre 2016 20h05, Joseph Tech  a écrit :
 

 Attached are the sstablemeta outputs from 2 SSTables of size 28 MB and 52 MB 
(out2). The records are inserted with different TTLs based on their nature ; 
test records with 1 day, typeA records with 6 months, typeB records with 1 year 
etc. There are also explicit DELETEs from this table, though it's much lower 
than the rate of inserts.
I am not sure how to interpret this output, or if it's the right SSTables that 
were picked. Please advise. Is there a way to get the sstables corresponding to 
the keys that timed out, though they are accessible later.
On Mon, Sep 5, 2016 at 10:58 PM, Anshu Vajpayee  
wrote:

We have seen read time out issue in cassandra due to high droppable tombstone 
ratio for repository. 
Please check for high droppable tombstone ratio for your repo. 
On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin  wrote:

Yes dclocal_read_repair_chance will reduce the cross-DC traffic and latency, so 
you can swap the values ( https://issues.apache.org/ji ra/browse/CASSANDRA-7320 
). I guess the sstable_size_in_mb was set to 50 because back in the day (C* 
1.0) the default size was way too small: 5 MB. So maybe someone in your company 
tried "10 * the default" i.e. 50 MB. Now the default is 160 MB. I don't say to 
change the value but just keep in mind that you're using a small value here, it 
could help you someday.
Regarding the cells, the histograms shows an *estimation* of the min, p50, ..., 
p99, max of cells based on SSTables metadata. On your screenshot, the Max is 
4768. So you have a partition key with ~ 4768 cells. The p99 is 1109, so 99% of 
your partition keys have less than (or equal to) 1109 cells. You can see these 
data of a given sstable with the tool sstablemetadata.
Best,
Romain
 

Le Lundi 5 septembre 2016 15h17, Joseph Tech  a 
écrit :
 

 Thanks, Romain . We will try to enable the DEBUG logging (assuming it won't 
clog the logs much) . Regarding the table configs, read_repair_chance must be 
carried over from older versions - mostly defaults. I think sstable_size_in_mb 
was set to limit the max SSTab

Re: WriteTimeoutException with LOCAL_QUORUM

2016-09-06 Thread Romain Hardouin
1) Is it a typo or did you really make a giant leap from C* 1.x to 3.4 with all 
the C*2.0 and C*2.1 upgrades? (btw if I were you, I would use the last 3.0.X)
2) Regarding NTR all time blocked (e.g. 26070160 from the logs), have a look to 
the patch "max_queued_ntr_property.txt": 
https://issues.apache.org/jira/browse/CASSANDRA-11363)   Then set 
-Dcassandra.max_queued_native_transport_requests=XXX to a value that works for 
you.
3) Regarding write timeouts:   - Are your writes idempotent? You can retry when 
a WriteTimeoutException is catched, see IdempotenceAwareRetryPolicy.   - We can 
see Hints in the logs => Do you monitor the frequency/number of hints? Do you 
see some UnavailableException at the driver level?        It means that some 
nodes are unreachable and even if it should trigger an UnavailableException, it 
may also raise WriteTimeoutException if the coordinator of a request doesn't 
know yet that the node is unreachable (see failure detector)    - 4 GB of heap 
is very small and you have 19 tables. Add 40 system tables to this and you have 
59 tables that share 4 GB.   - You are using batches for one/some table(s), 
right? Is it really required? Is is the most used table?   - What are the 
values of         * memtable_cleanup_threshold        * 
batch_size_warn_threshold_in_kb   - What the IO wait status on the nodes? Try 
to correlate timeout exceptions with IO wait load.   - Commitlog and data are 
on separate devices?   - What are the value of the following Mbean attributes 
on each nodes?        * 
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnCommit            - 
Count        * 
org.apache.cassandra.metrics:type=CommitLog,name=WaitingOnSegmentAllocation     
       - Mean            - 99thPercentile            - Max   - Do you see 
MemtableFlushWriter blocked tasks on nodes? I see 0 on the logs but the node 
may have been restarted (e.g. 18 hours of uptime on the nodetool info).       
4) Did you notice that you have tombstones warning? e.g.:
    WARN  [SharedPool-Worker-48] 2016-09-01 06:53:19,453 ReadCommand.java:481 - 
Read 5000 live rows and 1 tombstone cells for query SELECT * FROM 
pc_object_data_beta.vsc_data WHERE rundate, vscid = 1472653906000, 111034565 
LIMIT 5000 (see tombstone_warn_threshold)
The chances are high that your data model is not optimal. You should *really* 
fix this.       Best,
Romain 

Le Mardi 6 septembre 2016 6h47, "adeline@thomsonreuters.com" 
 a écrit :
 

        From: Pan, Adeline (TR Technology & Ops)
Sent: Tuesday, September 06, 2016 12:34 PM
To: 'user@cassandra.apache.org'
Cc: Yang, Ling (TR Technology & Ops)
Subject: FW: WriteTimeoutException with LOCAL_QUORUM    Hi All, I hope you are 
doing well today, and I need your help.    We were using Cassandra 1 before, 
now we are upgrading  to Cassandra 3.4 . During the integration test, we 
encountered “WriteTimeoutException”  very frequently (about every other 
minute), the exception message is as below.  The exception trace is in the 
attach file.     
| Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: 
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)     |

   There is some information: 1.  It is a six nodes cluster, two data 
centers, and three nodes for each datacenter. The consistency level we are 
using is LOCAL_QUORUM 2.  The node info  
| [BETA:@:/local/java/cassandra3/current]$ bin/nodetool -hlocalhost 
info ID : ad077318-6531-498e-bf5a-14ac339d1a45 Gossip 
active  : true Thrift active  : false Native Transport active: 
true Load   : 23.47 GB Generation No  : 1473065408 
Uptime (seconds)   : 67180 Heap Memory (MB)   : 1679.57 / 4016.00 Off 
Heap Memory (MB)   : 10.34 Data Center    : dc1 Rack   
: rack1 Exceptions : 0 Key Cache  : entries 32940, size 
3.8 MB, capacity 100 MB, 2124114 hits, 2252348 requests, 0.943 recent hit rate, 
14400 save period in seconds Row Cache  : entries 0, size 0 bytes, 
capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in 
seconds Counter Cache  : entries 0, size 0 bytes, capacity 50 MB, 0 
hits, 0 requests, NaN recent hit rate, 7200 save period in seconds Token
  : (invoke with -T/--tokens to see all 256 tokens)  |

3.  We have increased the write_request_timeout_in_ms to 4,  which 
didn’t work. 4.  The memtable size is 4GB. 5.  
memtable_allocation_type: heap_buffers 6.  In the Cassandra server log, we 
found there are Native-Transport-Requests  pending from time to time. (The 
server log piece is in attach file) 
| INFO  [ScheduledTasks:1] 2016-09-01 10:08:47,036 StatusLogger.java:52 - Pool 
Name      Active   Pending  Completed   Blocked  
All Time Blocked INFO  [ScheduledTasks:1] 2016-09-01 10:08:47,043 
St

Re: Read timeouts on primary key queries

2016-09-06 Thread Romain Hardouin
There is nothing special in the two sstablemetadata outuputs but if the 
timeouts are due to a network split or overwhelmed node or something like that 
you won't see anything here. That said, if you have the keys which produced the 
timeouts then, yes, you can look for a regular pattern (i.e. always the same 
keys?). 

You can find sstables for a given key with nodetool:    nodetool getendpoints 
  Then you can run the following command on one/each node of 
the enpoints:    nodetool getsstables   
If many sstables are shown in the previous command it means that your data is 
fragmented but thanks to LCS this number should be low.
I think the most usefull actions now would be:
 1) Enable DEBUG for o.a.c.db.ConsistencyLevel, it won't spam your log file, 
you will see the following when errors will occur:    - Local replicas 
[, ...] are insufficient to satisfy LOCAL_QUORUM requirement of X 
live nodes in ''
    You are using C* 2.1 but you can have a look at the C* 2.2 logback.xml: 
https://github.com/apache/cassandra/blob/cassandra-2.2/conf/logback.xml    I'm 
using it on production, it's better because it creates a separate debug.log 
file with a asynchronous appender.
   Watch out when enabling:              
Because the default logback configuration set all o.a.c in DEBUG:           
       Instead you can set:  
         
    Also, if you want to restrict debug.log to DEBUG level only (instead of 
DEBUG+INFO+...) you can add a LevelFilter to ASYNCDEBUGLOG in logback.xml:      
            
DEBUG      ACCEPT      
DENY    
  Thus, the debug.log file will be empty unless some Consistency issues happen. 
   2) Enable slow queries log at the driver level with a QueryLogger:    
Cluster cluster = ...   // log queries longer than 1 second, see also 
withDynamicThreshold   QueryLogger queryLogger = 
QueryLogger.builder(cluster).withConstantThreshold(1000).build();   
cluster.register(queryLogger);        Then in your driver logback file:         
     
3) And/or: you mentioned that you use DSE so you can enable slow queries 
logging in dse.yaml (cql_slow_log_options)
Best,
Romain   

 Le Lundi 5 septembre 2016 20h05, Joseph Tech  a écrit :
 

 Attached are the sstablemeta outputs from 2 SSTables of size 28 MB and 52 MB 
(out2). The records are inserted with different TTLs based on their nature ; 
test records with 1 day, typeA records with 6 months, typeB records with 1 year 
etc. There are also explicit DELETEs from this table, though it's much lower 
than the rate of inserts.
I am not sure how to interpret this output, or if it's the right SSTables that 
were picked. Please advise. Is there a way to get the sstables corresponding to 
the keys that timed out, though they are accessible later.
On Mon, Sep 5, 2016 at 10:58 PM, Anshu Vajpayee  
wrote:

We have seen read time out issue in cassandra due to high droppable tombstone 
ratio for repository. 
Please check for high droppable tombstone ratio for your repo. 
On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin  wrote:

Yes dclocal_read_repair_chance will reduce the cross-DC traffic and latency, so 
you can swap the values ( https://issues.apache.org/ji ra/browse/CASSANDRA-7320 
). I guess the sstable_size_in_mb was set to 50 because back in the day (C* 
1.0) the default size was way too small: 5 MB. So maybe someone in your company 
tried "10 * the default" i.e. 50 MB. Now the default is 160 MB. I don't say to 
change the value but just keep in mind that you're using a small value here, it 
could help you someday.
Regarding the cells, the histograms shows an *estimation* of the min, p50, ..., 
p99, max of cells based on SSTables metadata. On your screenshot, the Max is 
4768. So you have a partition key with ~ 4768 cells. The p99 is 1109, so 99% of 
your partition keys have less than (or equal to) 1109 cells. You can see these 
data of a given sstable with the tool sstablemetadata.
Best,
Romain
 

Le Lundi 5 septembre 2016 15h17, Joseph Tech  a 
écrit :
 

 Thanks, Romain . We will try to enable the DEBUG logging (assuming it won't 
clog the logs much) . Regarding the table configs, read_repair_chance must be 
carried over from older versions - mostly defaults. I think sstable_size_in_mb 
was set to limit the max SSTable size, though i am not sure on the reason for 
the 50 MB value.
Does setting dclocal_read_repair_chance help in reducing cross-DC traffic 
(haven't looked into this parameter, just going by the name).

By the cell count definition : is it incremented based on the number of writes 
for a given name(key?) and value. This table is heavy on reads and writes. If 
so, the value should be much higher?
On Mon, Sep 5, 2016 at 7:35 AM, Romain Hardouin  wrote:

Hi,
Try to put org.apache.cassandra.db. ConsistencyLevel at DEBUG level, it could 
help to find a regular pattern. By the way, I see that you have set a global 
read repair chance:    read_repair_chance = 0.1And not th

Re: Is it possible to replay hints after running nodetool drain?

2016-09-05 Thread Romain Hardouin
Hi,
You don't have to worry about that unless you write with CL = ANY. The sole 
method to force hints that I know is to invoke scheduleHintDelivery on 
"org.apache.cassandra.db:type=HintedHandoffManager" via JMX but it takes an 
endpoint as argument. If you have lots of nodes and several DCs, make sure to 
properly set hinted_handoff_throttle_in_kb and max_hints_delivery_threads.
Best,
Romain
 

Le Samedi 3 septembre 2016 2h59, jerome  a 
écrit :
 

 #yiv4261622774 #yiv4261622774 -- P 
{margin-top:0;margin-bottom:0;}#yiv4261622774 Hi Matija,
Thanks for your help! The downtime is minimal, usually less than five minutes. 
Since it is so short we're not so concerned about the node that's down missing 
data, we just want to make sure that before it goes down it replays all the 
hints it has so that there won't be any gaps in data on any other nodes for the 
hints it has while it's down.
Thanks,Jerome From: Matija Gobec 
Sent: Friday, September 2, 2016 6:05:01 PM
To: user@cassandra.apache.org
Subject: Re: Is it possible to replay hints after running nodetool drain? Hi 
Jerome,
The node being drained stops listening to requests but the other nodes being 
coordinators for given requests will store hints for that downed node for a 
configured period of time (max_hint_window_in_ms is 3 hours by default). If the 
downed node is back online in this time window it will receive hints from other 
nodes in the cluster and eventually catch up.What is your typical maintenance 
downtime?
Regards,Matija
On Fri, Sep 2, 2016 at 10:53 PM, jerome  wrote:

Hello,
As part of routine maintenance for our cluster, my colleagues and I will run a 
nodetool drain before stopping a Cassandra node, performing maintenance, and 
bringing it back up. We run maintenance as a cron-job with a lock stored in a 
different cluster to ensure only node is ever down at a time. We would like to 
make sure the node has replayed all its hints before bringing it down to 
minimize the potential window in which users might read out-of-date data (we 
read at a consistency level of ONE). Is it possible to replay hints after 
performing a nodetool drain? The documentation leads me to believe its not 
since Cassandra will stop listening for connections from other nodes, but I was 
unable to find anything definitive either way. If a node won't replay hints 
after a nodetool drain, is there perhaps another way to tell Cassandra to stop 
listening for client connections but continue to replay hints to other nodes.
Thanks,Jerome



   

Re: Read timeouts on primary key queries

2016-09-05 Thread Romain Hardouin
Yes dclocal_read_repair_chance will reduce the cross-DC traffic and latency, so 
you can swap the values ( https://issues.apache.org/jira/browse/CASSANDRA-7320 
). I guess the sstable_size_in_mb was set to 50 because back in the day (C* 
1.0) the default size was way too small: 5 MB. So maybe someone in your company 
tried "10 * the default" i.e. 50 MB. Now the default is 160 MB. I don't say to 
change the value but just keep in mind that you're using a small value here, it 
could help you someday.
Regarding the cells, the histograms shows an *estimation* of the min, p50, ..., 
p99, max of cells based on SSTables metadata. On your screenshot, the Max is 
4768. So you have a partition key with ~ 4768 cells. The p99 is 1109, so 99% of 
your partition keys have less than (or equal to) 1109 cells. You can see these 
data of a given sstable with the tool sstablemetadata.
Best,
Romain
 

Le Lundi 5 septembre 2016 15h17, Joseph Tech  a 
écrit :
 

 Thanks, Romain . We will try to enable the DEBUG logging (assuming it won't 
clog the logs much) . Regarding the table configs, read_repair_chance must be 
carried over from older versions - mostly defaults. I think sstable_size_in_mb 
was set to limit the max SSTable size, though i am not sure on the reason for 
the 50 MB value.
Does setting dclocal_read_repair_chance help in reducing cross-DC traffic 
(haven't looked into this parameter, just going by the name).

By the cell count definition : is it incremented based on the number of writes 
for a given name(key?) and value. This table is heavy on reads and writes. If 
so, the value should be much higher?
On Mon, Sep 5, 2016 at 7:35 AM, Romain Hardouin  wrote:

Hi,
Try to put org.apache.cassandra.db. ConsistencyLevel at DEBUG level, it could 
help to find a regular pattern. By the way, I see that you have set a global 
read repair chance:    read_repair_chance = 0.1And not the local read repair:   
 dclocal_read_repair_chance = 0.0 Is there any reason to do that or is it just 
the old (pre 2.0.9) default configuration? 
The cell count is the number of triplets: (name, value, timestamp)
Also, I see that you have set sstable_size_in_mb at 50 MB. What is the rational 
behind this? (Yes I'm curious :-) ). Anyway your "SSTables per read" are good.
Best,
Romain
Le Lundi 5 septembre 2016 13h32, Joseph Tech  a 
écrit :
 

 Hi Ryan,
Attached are the cfhistograms run within few mins of each other. On the 
surface, don't see anything which indicates too much skewing (assuming skewing 
==keys spread across many SSTables) . Please confirm. Related to this, what 
does the "cell count" metric indicate ; didn't find a clear explanation in the 
documents.
Thanks,Joseph

On Thu, Sep 1, 2016 at 6:30 PM, Ryan Svihla  wrote:

 Have you looked at cfhistograms/tablehistograms your data maybe just skewed 
(most likely explanation is probably the correct one here)

Regard,
Ryan Svihla
 _
From: Joseph Tech 
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To: 


Patrick,
The desc table is below (only col names changed) : 
CREATE TABLE db.tbl (    id1 text,    id2 text,    id3 text,    id4 text,    f1 
text,    f2 map,    f3 map,    created timestamp,    
updated timestamp,    PRIMARY KEY (id1, id2, id3, id4)) WITH CLUSTERING ORDER 
BY (id2 ASC, id3 ASC, id4 ASC)    AND bloom_filter_fp_chance = 0.01    AND 
caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'    AND comment = ''    
AND compaction = {'sstable_size_in_mb': '50', 'class': 
'org.apache.cassandra.db. compaction. LeveledCompactionStrategy'}    AND 
compression = {'sstable_compression': 'org.apache.cassandra.io. 
compress.LZ4Compressor'}    AND dclocal_read_repair_chance = 0.0    AND 
default_time_to_live = 0    AND gc_grace_seconds = 864000    AND 
max_index_interval = 2048    AND memtable_flush_period_in_ms = 0    AND 
min_index_interval = 128    AND read_repair_chance = 0.1    AND 
speculative_retry = '99.0PERCENTILE';
and the query is select * from tbl where id1=? and id2=? and id3=? and id4=?
The timeouts happen within ~2s to ~5s, while the successful calls have avg of 
8ms and p99 of 15s. These times are seen from app side, the actual query times 
would be slightly lower. 
Is there a way to capture traces only when queries take longer than a specified 
duration? . We can't enable tracing in production given the volume of traffic. 
We see that the same query which timed out works fine later, so not sure if the 
trace of a successful run would help.
Thanks,Joseph

On Wed, Aug 31, 2016 at 8:05 PM, Patrick McFadin  wrote:

If you are getting a timeout on one table, then a mismatch of RF and node count 
doesn't seem as likely. 
Time to look at your query. You said it was a 'select * from ta

Re: Read timeouts on primary key queries

2016-09-05 Thread Romain Hardouin
Hi,
Try to put org.apache.cassandra.db.ConsistencyLevel at DEBUG level, it could 
help to find a regular pattern. By the way, I see that you have set a global 
read repair chance:    read_repair_chance = 0.1And not the local read repair:   
 dclocal_read_repair_chance = 0.0 Is there any reason to do that or is it just 
the old (pre 2.0.9) default configuration? 
The cell count is the number of triplets: (name, value, timestamp)
Also, I see that you have set sstable_size_in_mb at 50 MB. What is the rational 
behind this? (Yes I'm curious :-) ). Anyway your "SSTables per read" are good.
Best,
Romain
Le Lundi 5 septembre 2016 13h32, Joseph Tech  a 
écrit :
 

 Hi Ryan,
Attached are the cfhistograms run within few mins of each other. On the 
surface, don't see anything which indicates too much skewing (assuming skewing 
==keys spread across many SSTables) . Please confirm. Related to this, what 
does the "cell count" metric indicate ; didn't find a clear explanation in the 
documents.
Thanks,Joseph

On Thu, Sep 1, 2016 at 6:30 PM, Ryan Svihla  wrote:

 Have you looked at cfhistograms/tablehistograms your data maybe just skewed 
(most likely explanation is probably the correct one here)

Regard,
Ryan Svihla
 _
From: Joseph Tech 
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To: 


Patrick,
The desc table is below (only col names changed) : 
CREATE TABLE db.tbl (    id1 text,    id2 text,    id3 text,    id4 text,    f1 
text,    f2 map,    f3 map,    created timestamp,    
updated timestamp,    PRIMARY KEY (id1, id2, id3, id4)) WITH CLUSTERING ORDER 
BY (id2 ASC, id3 ASC, id4 ASC)    AND bloom_filter_fp_chance = 0.01    AND 
caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'    AND comment = ''    
AND compaction = {'sstable_size_in_mb': '50', 'class': 
'org.apache.cassandra.db. compaction. LeveledCompactionStrategy'}    AND 
compression = {'sstable_compression': 'org.apache.cassandra.io. 
compress.LZ4Compressor'}    AND dclocal_read_repair_chance = 0.0    AND 
default_time_to_live = 0    AND gc_grace_seconds = 864000    AND 
max_index_interval = 2048    AND memtable_flush_period_in_ms = 0    AND 
min_index_interval = 128    AND read_repair_chance = 0.1    AND 
speculative_retry = '99.0PERCENTILE';
and the query is select * from tbl where id1=? and id2=? and id3=? and id4=?
The timeouts happen within ~2s to ~5s, while the successful calls have avg of 
8ms and p99 of 15s. These times are seen from app side, the actual query times 
would be slightly lower. 
Is there a way to capture traces only when queries take longer than a specified 
duration? . We can't enable tracing in production given the volume of traffic. 
We see that the same query which timed out works fine later, so not sure if the 
trace of a successful run would help.
Thanks,Joseph

On Wed, Aug 31, 2016 at 8:05 PM, Patrick McFadin  wrote:

If you are getting a timeout on one table, then a mismatch of RF and node count 
doesn't seem as likely. 
Time to look at your query. You said it was a 'select * from table where key=?' 
type query. I would next use the trace facility in cqlsh to investigate 
further. That's a good way to find hard to find issues. You should be looking 
for clear ledge where you go from single digit ms to 4 or 5 digit ms times. 
The other place to look is your data model for that table if you want to post 
the output from a desc table.
Patrick


On Tue, Aug 30, 2016 at 11:07 AM, Joseph Tech  wrote:

On further analysis, this issue happens only on 1 table in the KS which has the 
max reads. 
@Atul, I will look at system health, but didnt see anything standing out from 
GC logs. (using JDK 1.8_92 with G1GC). 
@Patrick , could you please elaborate the "mismatch on node count + RF" part.
On Tue, Aug 30, 2016 at 5:35 PM, Atul Saroha  wrote:

There could be many reasons for this if it is intermittent. CPU usage + I/O 
wait status. As read are I/O intensive, your IOPS requirement should be met 
that time load. Heap issue if CPU is busy for GC only. Network health could be 
the reason. So better to look system health during that time when it comes.

-- -- 
-- ---
Atul Saroha
Lead Software Engineer
M: +91 8447784271 T: +91 124-415-6069 EXT: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech  wrote:

Hi Patrick,
The nodetool status shows all nodes up and normal now. From OpsCenter "Event 
Log" , there are some nodes reported as being down/up etc. during the timeframe 
of timeout, but these are Search workload nodes from the remote (non-local) DC. 
The RF is 3 and there are 9 nodes per DC.
Thanks,Joseph
On Mon, Aug 29, 2016 at 11:07 PM, Patrick McFadin  wrote:

You aren't achieving quorum on your reads as the error is explains. That means 
you either have 

Re: nodetool repair with -pr and -dc

2016-08-19 Thread Romain Hardouin
Hi Jérôme,
The code in 2.2.6 allows -local and 
-pr:https://github.com/apache/cassandra/blob/cassandra-2.2.6/src/java/org/apache/cassandra/service/StorageService.java#L2899
But... the options validation introduced in CASSANDRA-6455 seems to break this 
feature!https://github.com/apache/cassandra/blob/cassandra-2.2.6/src/java/org/apache/cassandra/repair/messages/RepairOption.java#L211
I suggest to open a ticket https://issues.apache.org/jira/browse/cassandra/
Best,
Romain 

Le Vendredi 19 août 2016 11h47, Jérôme Mainaud  a écrit 
:
 

 Hello,

I've got a repair command with both -pr and -local rejected on an 2.2.6 cluster.
The exact command was : nodetool repair --full -par -pr -local -j 4
The message is  “You need to run primary range repair on all nodes in the 
cluster”.

Reading the code and previously cited CASSANDRA-7450, it should have been 
accepted.

Did anyone meet this error before ?

Thanks


-- 
Jérôme Mainaud
jer...@mainaud.com

2016-08-12 1:14 GMT+02:00 kurt Greaves :

-D does not do what you think it does. I've quoted the relevant documentation 
from the README:


Multiple Datacenters
If you have multiple datacenters in your ring, then you MUST specify the name 
of the datacenter containing the node you are repairing as part of the 
command-line options (--datacenter=DCNAME). Failure to do so will result in 
only a subset of your data being repaired (approximately 
data/number-of-datacenters). This is because nodetool has no way to determine 
the relevant DC on its own, which in turn means it will use the tokens from 
every ring member in every datacenter.


On 11 August 2016 at 12:24, Paulo Motta  wrote:

> if we want to use -pr option ( which i suppose we should to prevent duplicate 
> checks) in 2.0 then if we run the repair on all nodes in a single DC then it 
> should be sufficient and we should not need to run it on all nodes across 
> DC's?

No, because the primary ranges of the nodes in other DCs will be missing 
repair, so you should either run with -pr in all nodes in all DCs, or restrict 
repair to a specific DC with -local (and have duplicate checks). Combined -pr 
and -local are only supported on 2.1


2016-08-11 1:29 GMT-03:00 Anishek Agarwal :

ok thanks, so if we want to use -pr option ( which i suppose we should to 
prevent duplicate checks) in 2.0 then if we run the repair on all nodes in a 
single DC then it should be sufficient and we should not need to run it on all 
nodes across DC's ?


On Wed, Aug 10, 2016 at 5:01 PM, Paulo Motta  wrote:

On 2.0 repair -pr option is not supported together with -local, -hosts or -dc, 
since it assumes you need to repair all nodes in all DCs and it will throw and 
error if you try to run with nodetool, so perhaps there's something wrong with 
range_repair options parsing.

On 2.1 it was added support to simultaneous -pr and -local options on 
CASSANDRA-7450, so if you need that you can either upgade to 2.1 or backport 
that to 2.0.

2016-08-10 5:20 GMT-03:00 Anishek Agarwal :

Hello,
We have 2.0.17 cassandra cluster(DC1) with a cross dc setup with a smaller 
cluster(DC2).  After reading various blogs about scheduling/running repairs 
looks like its good to run it with the following 

-pr for primary range only -st -et for sub ranges -par for parallel -dc to make 
sure we can schedule repairs independently on each Data centre we have. 
i have configured the above using the repair utility @ 
https://github.com/BrianGallew /cassandra_range_repair.git
which leads to the following command :
./src/range_repair.py -k [keyspace] -c [columnfamily name] -v -H localhost -p 
-D DC1

but looks like the merkle tree is being calculated on nodes which are part of 
other DC2.
why does this happen? i thought it should only look at the nodes in local 
cluster. however on nodetool the -pr option cannot be used with -local 
according to docs @https://docs.datastax.com/en/ cassandra/2.0/cassandra/tools/ 
toolsRepair.html
so i am may be missing something, can someone help explain this please.
thanksanishek









-- 
Kurt greavesk...@instaclustr.comwww.instaclustr.com



  

Re: A question to updatesstables

2016-08-19 Thread Romain Hardouin
Ok... you said 2.0.10 in the original post ;-)You can't upgrade from 1.2 to 
2.1.2.0.7 is the minimum. So upgrade to 2.0.17 (the latest 2.0.X) first, see 
https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt#L244
Best,
Romain 

Le Vendredi 19 août 2016 11h41, "Lu, Boying"  a écrit :
 

 #yiv4120164789 #yiv4120164789 -- _filtered #yiv4120164789 
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv4120164789 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv4120164789 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv4120164789 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv4120164789 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv4120164789 
{panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv4120164789 #yiv4120164789 
p.yiv4120164789MsoNormal, #yiv4120164789 li.yiv4120164789MsoNormal, 
#yiv4120164789 div.yiv4120164789MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv4120164789 a:link, 
#yiv4120164789 span.yiv4120164789MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv4120164789 a:visited, #yiv4120164789 
span.yiv4120164789MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv4120164789 p 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv4120164789 
p.yiv4120164789MsoAcetate, #yiv4120164789 li.yiv4120164789MsoAcetate, 
#yiv4120164789 div.yiv4120164789MsoAcetate 
{margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv4120164789 
p.yiv4120164789msoacetate, #yiv4120164789 li.yiv4120164789msoacetate, 
#yiv4120164789 div.yiv4120164789msoacetate 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv4120164789 
p.yiv4120164789msonormal, #yiv4120164789 li.yiv4120164789msonormal, 
#yiv4120164789 div.yiv4120164789msonormal 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv4120164789 
p.yiv4120164789msochpdefault, #yiv4120164789 li.yiv4120164789msochpdefault, 
#yiv4120164789 div.yiv4120164789msochpdefault 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv4120164789 
span.yiv4120164789msohyperlink {}#yiv4120164789 
span.yiv4120164789msohyperlinkfollowed {}#yiv4120164789 
span.yiv4120164789emailstyle19 {}#yiv4120164789 span.yiv4120164789emailstyle20 
{}#yiv4120164789 p.yiv4120164789msonormal1, #yiv4120164789 
li.yiv4120164789msonormal1, #yiv4120164789 div.yiv4120164789msonormal1 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv4120164789 
span.yiv4120164789msohyperlink1 
{color:blue;text-decoration:underline;}#yiv4120164789 
span.yiv4120164789msohyperlinkfollowed1 
{color:purple;text-decoration:underline;}#yiv4120164789 
p.yiv4120164789msoacetate1, #yiv4120164789 li.yiv4120164789msoacetate1, 
#yiv4120164789 div.yiv4120164789msoacetate1 
{margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv4120164789 
span.yiv4120164789emailstyle191 {color:#1F497D;}#yiv4120164789 
span.yiv4120164789emailstyle201 {color:#1F497D;}#yiv4120164789 
p.yiv4120164789msochpdefault1, #yiv4120164789 li.yiv4120164789msochpdefault1, 
#yiv4120164789 div.yiv4120164789msochpdefault1 
{margin-right:0cm;margin-left:0cm;font-size:10.0pt;}#yiv4120164789 
span.yiv4120164789EmailStyle32 {color:#1F497D;}#yiv4120164789 
span.yiv4120164789BalloonTextChar {}#yiv4120164789 .yiv4120164789MsoChpDefault 
{font-size:10.0pt;} _filtered #yiv4120164789 {margin:72.0pt 90.0pt 72.0pt 
90.0pt;}#yiv4120164789 div.yiv4120164789WordSection1 {}#yiv4120164789 yes, we 
use Cassandra 2.1.11 in our latest release.    From: Romain Hardouin 
[mailto:romainh...@yahoo.fr]
Sent: 2016年8月19日 17:36
To: user@cassandra.apache.org
Subject: Re: A question to updatesstables    ka is the 2.1 format... I don't 
understand. Did you install C* 2.1?    Romain    Le Vendredi 19 août 2016 
11h32, "Lu, Boying"  a écrit :    Here is the error message 
in our log file: java.lang.RuntimeException: Incompatible SSTable found. 
Current version ka is unable to read file: 
/data/db/1/data/StorageOS/RemoteDirectorGroup/StorageOS-RemoteDirectorGroup-ic-37.
 Please run upgradesstables.     at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:517)
     at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:494)
     at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:335)     at 
org.apache.cassandra.db.Keyspace.(Keyspace.java:275)     at 
org.apache.cassandra.db.Keyspace.open(Keyspace.java:121)     at 
org.apache.cassandra.db.Keyspace.open(Keyspace.java:98)     at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:328)    
 at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:479) 
  From: Ryan Svihla [mailto:r...@foundev.pro]
Sent: 2016年8月19日 17:26
To: user@cassandra.apache.org
Subject: Re: A question to updatesstables   The actual error message could be 
very useful to diagnose the reason. There are warnings about incompatible 
formats which are safe to ignore (usually in the cache) and I have one time 

Re: A question to updatesstables

2016-08-19 Thread Romain Hardouin
ka is the 2.1 format... I don't understand. Did you install C* 2.1?
Romain 

Le Vendredi 19 août 2016 11h32, "Lu, Boying"  a écrit :
 

 #yiv1355196952 #yiv1355196952 -- _filtered #yiv1355196952 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv1355196952 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv1355196952 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv1355196952 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv1355196952 
{panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv1355196952 #yiv1355196952 
p.yiv1355196952MsoNormal, #yiv1355196952 li.yiv1355196952MsoNormal, 
#yiv1355196952 div.yiv1355196952MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv1355196952 a:link, 
#yiv1355196952 span.yiv1355196952MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv1355196952 a:visited, #yiv1355196952 
span.yiv1355196952MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv1355196952 p 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv1355196952 
p.yiv1355196952MsoAcetate, #yiv1355196952 li.yiv1355196952MsoAcetate, 
#yiv1355196952 div.yiv1355196952MsoAcetate 
{margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv1355196952 
span.yiv1355196952hoenzb {}#yiv1355196952 span.yiv1355196952EmailStyle19 
{color:#1F497D;}#yiv1355196952 span.yiv1355196952EmailStyle20 
{color:#1F497D;}#yiv1355196952 span.yiv1355196952BalloonTextChar 
{}#yiv1355196952 .yiv1355196952MsoChpDefault {font-size:10.0pt;} _filtered 
#yiv1355196952 {margin:72.0pt 90.0pt 72.0pt 90.0pt;}#yiv1355196952 
div.yiv1355196952WordSection1 {}#yiv1355196952 Here is the error message in our 
log file: java.lang.RuntimeException: Incompatible SSTable found. Current 
version ka is unable to read file: 
/data/db/1/data/StorageOS/RemoteDirectorGroup/StorageOS-RemoteDirectorGroup-ic-37.
 Please run upgradesstables.     at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:517)
     at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:494)
     at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:335)     at 
org.apache.cassandra.db.Keyspace.(Keyspace.java:275)     at 
org.apache.cassandra.db.Keyspace.open(Keyspace.java:121)     at 
org.apache.cassandra.db.Keyspace.open(Keyspace.java:98)     at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:328)    
 at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:479) 
   From: Ryan Svihla [mailto:r...@foundev.pro]
Sent: 2016年8月19日 17:26
To: user@cassandra.apache.org
Subject: Re: A question to updatesstables    The actual error message could be 
very useful to diagnose the reason. There are warnings about incompatible 
formats which are safe to ignore (usually in the cache) and I have one time 
seen an issue with commit log archiving preventing a startup during upgrade. 
Usually there is something else broken and the version mismatch is a false 
signal. 
Regards,    Ryan Svihla 
On Aug 18, 2016, at 10:18 PM, Lu, Boying  wrote: 
Thanks a lot.   I’m a little bit of confusing.  If the ‘nodetool updatesstable’ 
doesn’t work without Cassandra server running, and Cassandra server failed to 
start due to the incompatible SSTable format,  how to resolve this dilemma?     
  From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: 2016年8月18日 18:44
To: user@cassandra.apache.org
Subject: Re: A question to updatesstables   Replies inline 
 Carlos Alonso | Software Engineer | @calonso   On 18 August 2016 at 11:56, Lu, 
Boying  wrote: Hi, All,   We use Cassandra in our product. I 
our early release we use Cassandra 1.2.10 whose SSTable is ‘ic’ format. We 
upgrade Cassandra to 2.0.10 in our product release. But the Cassandra server 
failed to start due to the incompatible SSTable format and the log message told 
us to use ‘nodetool updatesstables’ to upgrade SSTable files.   To make sure 
that no negative impact on our data, I want to confirm following things about 
this command before trying it: 1.  Does it work without Cassandra server 
running? No, it won't.  
2.  Will it cause data lost with this command? 
It shouldn't if you followed the upgrade instructions properly 
3.  What’s the best practice to void this error occurs again (e.g. 
upgrading Cassandra next time)? 
Upgrading SSTables is required or not depending on the upgrade you're running, 
basically if the SSTables layout changes you'll need to run it and not 
otherwise so there's nothing you can do to avoid it  
  Thanks   Boying 
  


  

Re: A question to updatesstables

2016-08-19 Thread Romain Hardouin
Hi,
There are two ways to upgrade SSTables: - online (C* must be UP): nodetool 
upgradesstables - offline (when C* is stopped): using the tool called 
"sstableupgrade".    It's located in the bin directory of Cassandra so 
depending on how you installed Cassandra, it may be on the path.    See 
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/ToolsSSTableupgrade_t.html
Few questions: - Did you check you are not hitting 
https://github.com/apache/cassandra/blob/cassandra-2.0/NEWS.txt#L162 ?    i.e. 
are you sure that all your data are in "ic" format? - Why did you choose 
2.0.10? (The latest 2.0 release being 2.0.17.)  Best,  Romain 

Le Vendredi 19 août 2016 5h18, "Lu, Boying"  a écrit :
 

 #yiv8524026874 #yiv8524026874 -- _filtered #yiv8524026874 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv8524026874 
{font-family:宋体;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv8524026874 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv8524026874 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv8524026874 
{panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv8524026874 #yiv8524026874 
p.yiv8524026874MsoNormal, #yiv8524026874 li.yiv8524026874MsoNormal, 
#yiv8524026874 div.yiv8524026874MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv8524026874 a:link, 
#yiv8524026874 span.yiv8524026874MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv8524026874 a:visited, #yiv8524026874 
span.yiv8524026874MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv8524026874 p 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv8524026874 
span.yiv8524026874hoenzb {}#yiv8524026874 span.yiv8524026874EmailStyle19 
{color:#1F497D;}#yiv8524026874 .yiv8524026874MsoChpDefault {} _filtered 
#yiv8524026874 {margin:72.0pt 90.0pt 72.0pt 90.0pt;}#yiv8524026874 
div.yiv8524026874WordSection1 {}#yiv8524026874 Thanks a lot.    I’m a little 
bit of confusing.  If the ‘nodetool updatesstable’ doesn’t work without 
Cassandra server running, and Cassandra server failed to start due to the 
incompatible SSTable format,  how to resolve this dilemma?          From: 
Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: 2016年8月18日 18:44
To: user@cassandra.apache.org
Subject: Re: A question to updatesstables    Replies inline 
 Carlos Alonso | Software Engineer | @calonso    On 18 August 2016 at 11:56, 
Lu, Boying  wrote: Hi, All,   We use Cassandra in our 
product. I our early release we use Cassandra 1.2.10 whose SSTable is ‘ic’ 
format. We upgrade Cassandra to 2.0.10 in our product release. But the 
Cassandra server failed to start due to the incompatible SSTable format and the 
log message told us to use ‘nodetool updatesstables’ to upgrade SSTable files.  
 To make sure that no negative impact on our data, I want to confirm following 
things about this command before trying it: 1.  Does it work without 
Cassandra server running? No, it won't.  
2.  Will it cause data lost with this command? 
It shouldn't if you followed the upgrade instructions properly 
3.  What’s the best practice to void this error occurs again (e.g. 
upgrading Cassandra next time)? 
Upgrading SSTables is required or not depending on the upgrade you're running, 
basically if the SSTables layout changes you'll need to run it and not 
otherwise so there's nothing you can do to avoid it  
  Thanks   Boying 
   

  

Re: upgradesstables throws error when migrating from 2.0.14 to 2.1.13

2016-08-12 Thread Romain Hardouin
Hi,
Try this and check the yaml file path:   strace -f -e open nodetool 
upgradesstables 2>&1 | grep cassandra.yaml
How C* is installed (package, tarball)? Other nodetool commands run fine?Also, 
did you try offline SSTable upgrade with the sstableupgrade tool?
Best,
Romain 

Le Vendredi 12 août 2016 15h31, Amit Singh F  a 
écrit :
 

  Hi All,   
We are in process of migrating from 2.0.14 to 2.1.13  and we are able to 
successfully install binaries and make Cassandra 2.1.13 running up and fine. 
But issue comes up when we try to runnodetool upgradesstables , it gets 
finished in few seconds only which means it does not find any old sstables 
which needs to be upgraded but when I locate sstables on disk , I can see them 
in old state.   Also when I try running sstableupgrade command below error is 
thrown:   org.apache.cassandra.exceptions.ConfigurationException: Expecting URI 
in variable: [cassandra.config].  Please prefix the file with file:/// for 
local files or file:/// for remote files. Aborting. If you are 
executing this from an external tool, it needs to set 
Config.setClientMode(true) to avoid loading configuration.     at 
org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:73)
 ~[apache-cassandra-2.1.13.jar:2.1.13]     at 
org.apache.cassandra.config.YamlConfigurationLoader.loadConfig(YamlConfigurationLoader.java:84)
 ~[apache-cassandra-2.1.13.jar:2.1.13]     at 
org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:161)
 ~[apache-cassandra-2.1.13.jar:2.1.13]     at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:136)
 ~[apache-cassandra-2.1.13.jar:2.1.13]     at 
org.apache.cassandra.tools.StandaloneUpgrader.main(StandaloneUpgrader.java:52) 
[apache-cassandra-2.1.13.jar:2.1.13] Expecting URI in variable: 
[cassandra.config].  Please prefix the file with file:/// for local files or 
file:/// for remote files. Aborting. If you are executing this from an 
external tool, it needs to set Config.setClientMode(true) to avoid loading 
configuration. Fatal configuration error; unable to start. See log for 
stacktrace.   Also I debug in code little bit and this error is due to invalid 
path of Cassandra.yaml, but I can skip this as my Cassandra node in UN state. 
So can anybody provide me some pointers to look into this.     Regards Amit 
Chowdhery    

  

Re: Question nodetool status

2016-08-12 Thread Romain Hardouin
It's a bit more involved than that. C* uses a "Phi accrual failure 
detector":https://docs.datastax.com/en/cassandra/3.x/cassandra/architecture/archDataDistributeFailDetect.html
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L878See also 
https://dspace.jaist.ac.jp/dspace/bitstream/10119/4784/1/IS-RR-2004-010.pdf

Best,
Romain 

Le Jeudi 11 août 2016 17h02, jean paul  a écrit :
 

 Hi, thanks a lot for answer :)

Gossip is a peer-to-peer communication protocol in which nodes periodically 
exchange state information about themselves and about other nodes they know 
about. 

unreachableNodes = probe.getUnreachableNodes();   --->  i.e if nodedon't 
publish heartbeats on x seconds (using gossip protocol), it's therefore marked 
'DN: down' ?
That's it? 




2016-08-11 13:51 GMT+01:00 Romain Hardouin :

Hi Jean Paul,
Yes, the gossiper is used. Example with down nodes:1. The status command 
retrieve unreachable nodes from a NodeProbe instance: https://github.com/ 
apache/cassandra/blob/trunk/ src/java/org/apache/cassandra/ 
tools/nodetool/Status.java#L64
2. The NodeProbe list comes from a StorageService proxy: 
https://github.com/apache/ cassandra/blob/trunk/src/java/ 
org/apache/cassandra/tools/ NodeProbe.java#L4383. The proxy calls the Gossiper 
singleton: https://github.com/ apache/cassandra/blob/trunk/ 
src/java/org/apache/cassandra/ service/StorageService.java# L2681 
Best,
Romain

Le Jeudi 11 août 2016 14h16, jean paul  a écrit :
 

 Hi all, 

$nodetool status

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/ Moving
--  Address    Load   Tokens  Owns (effective)  Host ID 
       Rack
UN  127.0.0.1  83.05 KB   256 100.0%    460ddcd9-1ee8-48b8-a618- 
c076056aad07  rack1

The nodetool command shows the status of the node (UN=up,DN=down):
Please i'd like to know how this command works and is it based on gossip 
protocol or not ?

Thank you so much for explanations.Best regards. 




   



   

Re: Question nodetool status

2016-08-11 Thread Romain Hardouin
Hi Jean Paul,
Yes, the gossiper is used. Example with down nodes:1. The status command 
retrieve unreachable nodes from a NodeProbe instance: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/nodetool/Status.java#L64
2. The NodeProbe list comes from a StorageService proxy: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeProbe.java#L4383.
 The proxy calls the Gossiper singleton: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L2681
 
Best,
Romain

Le Jeudi 11 août 2016 14h16, jean paul  a écrit :
 

 Hi all, 

$nodetool status

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/ Moving
--  Address    Load   Tokens  Owns (effective)  Host ID 
       Rack
UN  127.0.0.1  83.05 KB   256 100.0%    460ddcd9-1ee8-48b8-a618- 
c076056aad07  rack1

The nodetool command shows the status of the node (UN=up,DN=down):
Please i'd like to know how this command works and is it based on gossip 
protocol or not ?

Thank you so much for explanations.Best regards. 




  

Re: Re : Default values in Cassandra YAML file

2016-08-10 Thread Romain Hardouin
Yes. You can even see that some caution is taken in the code 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/Config.java#L131
 (But if I were you I would not rely on this. It's always better to be 
explicit.)
Best,
Romain

Le Mercredi 10 août 2016 17h50, sai krishnam raju potturi 
 a écrit :
 

 hi;   if there are any missed attributes in the YAML file, will Cassandra pick 
up default values for those attributes.
thanks


  

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Romain Hardouin
> Curious why the 2.2 to 3.x upgrade path is risky at best. I guess that 
>upgrade from 2.2 is less tested by DataStax QA because DSE4 used C* 2.1, not 
>2.2.I would say the safest upgrade is 2.1 to 3.0.x
Best,
Romain 


Re: unreachable nodes mystery in describecluster output

2016-08-03 Thread Romain Hardouin
That's a good news if describecluster shows the same version on each node. Try 
with a high timeout like 120 seconds to see if it works. Is there a VPN between 
DCs? Is there room for improvement at the network level? TCP tuning, etc. I'm 
not saying you won't have unreachable nodes but it's worth it if you can.
Romain
Le Mercredi 3 août 2016 15h02, Aleksandr Ivanov  a écrit :
 

 
The latency is high...
It is but is it really causing the problem? Latency is high but constant and 
not higher than ~200ms.

Regarding the ALTER, did you try to increase the timeout with "cqlsh 
--request-timeout=REQUEST_TIMEOUT"? Because the default is 10 seconds.

I use 25sec timeout (--request-timeout 25)

 Apart the unreachable nodes, do you know if all nodes have the same schema 
version? 
"nodetool gossipinfo" shows same scheme version on all nodes.

Best,
Romain


  

Re: unreachable nodes mystery in describecluster output

2016-08-03 Thread Romain Hardouin
Hi,
The latency is high...
Regarding the ALTER, did you try to increase the timeout with "cqlsh 
--request-timeout=REQUEST_TIMEOUT"? Because the default is 10 seconds. Apart 
the unreachable nodes, do you know if all nodes have the same schema version? 
Best,
Romain

Re: duplicate values for primary key

2016-08-02 Thread Romain Hardouin
Just to know, did you get some errors during the nodetool upgradesstables? 
Romain

Le Mardi 2 août 2016 8h40, Julien Anguenot  a écrit :
 

 Hey Oskar,
I would comment and add all possible information to that Jira issue…
   J. 
--Julien Anguenot (@anguenot)

On Aug 2, 2016, at 8:36 AM, Oskar Kjellin  wrote:

Hi,
Ran into the same issue when going to 3.5. Completely killed our cluster. Only 
way was to restore a backup.
/Oskar

On 2 aug. 2016, at 07:54, Julien Anguenot  wrote:



Hey Jesse, 
You might wanna check and comment against that issue:
  https://issues.apache.org/jira/browse/CASSANDRA-11887
  J. 
--Julien Anguenot (@anguenot)

On Aug 2, 2016, at 3:16 AM, Jesse Hodges  wrote:
Hi, I've got a bit of a conundrum. Recently I upgraded from 2.2.3 to 3.7.0 (ddc 
distribution). Following the upgrade (though this condition may have existed 
prior to upgrade)..
I have a table with a simple partition key and multiple clustering keys, and I 
have duplicates of many of the primary keys!
I've tried various repair options and lots of searching, but nothing's really 
helping. I'm unsure of how to troubleshoot further or potentially consolidate 
these keys. This seems like a bug, but hopefully it's something simple I 
missed. I'm also willing to troubleshoot further as needed, but I could use a 
few getting started pointers.
Example output: primary key is 
(partition_id,alarm_id,tenant_id,account_id,source,metric)

cqlsh:alarms> select * from alarms.last_seen_state  where partition_id=10 and 
alarm_id='59893';
 partition_id | alarm_id | tenant_id                            | account_id | 
source | metric | last_seen                       | 
value--+--+--++++-+---
           10 |    59893 | f50f8413-57bb-4eb5-a37c-7482a63ea9a5 |      10303 | 
PORTAL |    CPU | 2016-08-01 15:27:37.00+ |     1           10 |    
59893 | f50f8413-57bb-4eb5-a37c-7482a63ea9a5 |      10303 | PORTAL |    CPU | 
2016-08-01 15:07:15.00+ |     1
Thanks, Jesse




  

Re: (C)* stable version after 3.5

2016-07-14 Thread Romain Hardouin
DSE 4.8 uses C* 2.1 and DSE 5.0 uses C* 3.0. So I would say that 2.1->3.0 is 
more tested by DataStax than 2.2->3.0. 

Le Jeudi 14 juillet 2016 11h37, Stefano Ortolani  a 
écrit :
 

 FWIW, I've recently upgraded from 2.1 to 3.0 without issues of any sort, but 
admittedly I haven't been using anything too fancy.
Cheers,Stefano
On Wed, Jul 13, 2016 at 10:28 PM, Alain RODRIGUEZ  wrote:

Hi Anuj
>From 
>https://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgrdBestPractCassandra.html:

   
   - Employ a continual upgrade strategy for each year. Upgrades are impacted 
by the version you are upgrading from and the version you are upgrading to. The 
greater the gap between the current version and the target version, the more 
complex the upgrade.


And I could not find it but historically I am quite sure it was explicitly 
recommended not to skip a major update (for a rolling upgrade), even if I could 
not find it. Anyway it is clear that the bigger the gap is, the more careful we 
need to be.
On the other hand, I see 2.2 as a 2.1 + some feature but no real breaking 
changes (as 3.0 was already on the pipe) and doing a 2.2 was decided because 
3.0 was taking a long time to be released and some feature were ready for a 
while.
I might be wrong on some stuff above, but one can only speak with his knowledge 
and from his point of view. So I ended up saying:

Also I am not sure if the 2.2 major version is something you can skip while 
upgrading through a rolling restart. I believe you can, but it is not what is 
recommended.


Note that "I am not sure", "I believe you can"... So it was more a thought, 
something to explore for Varun :-).

And I actually encouraged him to move forward. Now that Tyler Hobbs confirmed 
it works, you can put a lot more trust on the fact that this upgrade will work 
:-). I would still encourage people to test it (for client compatibility, 
corner cases due to models, ...).
I hope I am more clear now,
C*heers,---Alain Rodriguez - alain@thelastpickle.comFrance
The Last Pickle - Apache Cassandra Consultinghttp://www.thelastpickle.com
2016-07-13 18:39 GMT+02:00 Tyler Hobbs :


On Wed, Jul 13, 2016 at 11:32 AM, Anuj Wadehra  wrote:

Why do you think that skipping 2.2 is not recommended when NEWS.txt suggests 
otherwise? Can you elaborate?

We test upgrading from 2.1 -> 3.x and upgrading from 2.2 -> 3.x equivalently.  
There should not be a difference in terms of how well the upgrade is supported.


-- 
Tyler Hobbs
DataStax






  

Re: Open source equivalents of OpsCenter

2016-07-14 Thread Romain Hardouin
Hi Juho,
Out of curiosity, which stack did you use to make your dashboard?  
Romain
Le Jeudi 14 juillet 2016 10h43, Juho Mäkinen  a 
écrit :
 

 I'm doing some work on replacing OpsCenter in out setup. I ended creating a 
Docker container which contains the following features:
 - Cassandra 2.2.7 - MX4J (a JMX to REST bridge) as a java-agent - 
metrics-graphite-3.1.0.jar (export some but not all JMX to graphite) - a custom 
ruby which uses MX4J to export some JMX metrics to graphite which we don't 
otherwise get.
With this I will get all our cassandra instances and their JMX exposed data to 
graphite, which allows us to use Grafana and Graphite to draw pretty dashboards.
In addition I started writing some code which currently provides the following 
features: - A dashboard which provides a similar ring view what OpsCenter does, 
with onMouseOver features to display more info on each node. - Simple HTTP 
GET/POST based api to do    - Setup a new non-vnode based cluster    - Get a 
JSON blob on cluster information, all its tokens, machines and so on    - Api 
for new cluster instances so that they can get a token slot from the ring when 
they boot.    - Option to kill a dead node and mark its slot for replace, so 
the new booting node can use cassandra.replace_address option.
The node is not yet packaged in any way for distribution and some parts depend 
on our Chef installation, but if there's interest I can publish at least some 
parts from it.
 - Garo
On Thu, Jul 14, 2016 at 10:54 AM, Romain Hardouin  wrote:

Do you run C* on physical machine or in the cloud? If the topology doesn't 
change too often you can have a look a Zabbix. The downside is that you have to 
set up all the JMX metrics yourself... but that's also a good point because you 
can have custom metrics. If you want nice graphs/dashboards you can use Grafana 
to plot Zabbix data. (We're also using SaaS but that's not open source).For the 
rolling restart and other admin stuff we're using Rundeck. It's a great tool 
when working in a team.
(I think it's time to implement an open source alternative to OpsCenter. If 
some guys are interested I'm in.)
Best,
Romain

 

Le Jeudi 14 juillet 2016 0h01, Ranjib Dey  a écrit :
 

 we use datadog (metrics emitted as raw statsd) for the dashboard. All repair & 
compaction is done via blender & 
serf[1].[1]https://github.com/pagerduty/blender 

On Wed, Jul 13, 2016 at 2:42 PM, Kevin O'Connor  wrote:

Now that OpsCenter doesn't work with open source installs, are there any runs 
at an open source equivalent? I'd be more interested in looking at metrics of a 
running cluster and doing other tasks like managing repairs/rolling restarts 
more so than historical data.



   



  

Re: Open source equivalents of OpsCenter

2016-07-14 Thread Romain Hardouin
Do you run C* on physical machine or in the cloud? If the topology doesn't 
change too often you can have a look a Zabbix. The downside is that you have to 
set up all the JMX metrics yourself... but that's also a good point because you 
can have custom metrics. If you want nice graphs/dashboards you can use Grafana 
to plot Zabbix data. (We're also using SaaS but that's not open source).For the 
rolling restart and other admin stuff we're using Rundeck. It's a great tool 
when working in a team.
(I think it's time to implement an open source alternative to OpsCenter. If 
some guys are interested I'm in.)
Best,
Romain

 

Le Jeudi 14 juillet 2016 0h01, Ranjib Dey  a écrit :
 

 we use datadog (metrics emitted as raw statsd) for the dashboard. All repair & 
compaction is done via blender & 
serf[1].[1]https://github.com/pagerduty/blender 

On Wed, Jul 13, 2016 at 2:42 PM, Kevin O'Connor  wrote:

Now that OpsCenter doesn't work with open source installs, are there any runs 
at an open source equivalent? I'd be more interested in looking at metrics of a 
running cluster and doing other tasks like managing repairs/rolling restarts 
more so than historical data.



  

Re: CPU high load

2016-07-13 Thread Romain Hardouin
Did you upgrade from a previous version? DId you make some schema changes like 
compaction strategy, compression, bloom filter, etc.?What about the R/W 
requests?  SharedPool Workers are... shared ;-) Put logs in debug to see some 
examples of what services are using this pool (many actually).

Best,
Romain 

Le Mercredi 13 juillet 2016 18h15, Patrick McFadin  a 
écrit :
 

 Might be more clear looking at nodetool tpstats 
>From there you can see all the thread pools and if there are any blocks. Could 
>be something subtle like network. 
On Tue, Jul 12, 2016 at 3:23 PM, Aoi Kadoya  wrote:

Hi,

I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks
ago, all of the cluster nodes are hitting avg. 15-20 cpu load.
These nodes are running on VMs(VMware vSphere) that have 8vcpu
(1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M)

At first I thought this is because of CPU iowait, however, iowait is
constantly low(in fact it's 0 almost all time time), CPU steal time is
also 0%.

When I took a thread dump, I found some of "SharedPool-Worker" threads
are consuming CPU and those threads seem to be waiting for something
so I assume this is the cause of cpu load.

"SharedPool-Worker-1" #240 daemon prio=5 os_prio=0
tid=0x7fabf459e000 nid=0x39b3 waiting on condition
[0x7faad7f02000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
at java.lang.Thread.run(Thread.java:745)

Thread dump looks like this, but I am not sure what is this
sharedpool-worker waiting for.
Would you please help me with the further trouble shooting?
I am also reading the thread posted by Yuan as the situation is very
similar to mine but I didn't get any blocked, dropped or pending count
in my tpstat result.

Thanks,
Aoi




  

Re: Is my cluster normal?

2016-07-13 Thread Romain Hardouin
Same behavior here with a very different setup.After an upgrade to 2.1.14 (from 
2.0.17) I see a high load and many NTR "all time blocked". Offheap memtable 
lowered the blocked NTR for me, I put a comment on CASSANDRA-11363 
Best,
Romain

Le Mercredi 13 juillet 2016 20h18, Yuan Fang  a 
écrit :
 

 Sometimes, the Pending can change from 128 to 129, 125 etc.

On Wed, Jul 13, 2016 at 10:32 AM, Yuan Fang  wrote:

$nodetool tpstats 
...Pool Name                               Active   Pending   Completed   
Blocked      All time blocked
Native-Transport-Requests       128       128        1420623949         1       
  142821509
...


What is this? Is it normal?
On Tue, Jul 12, 2016 at 3:03 PM, Yuan Fang  wrote:

Hi Jonathan,
Here is the result:
ubuntu@ip-172-31-44-250:~$ iostat -dmx 2 10Linux 3.13.0-74-generic 
(ip-172-31-44-250)  07/12/2016  _x86_64_ (4 CPU)
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.01     2.13   
 0.74    1.55     0.01     0.02    27.77     0.00    0.74    0.89    0.66   
0.43   0.10xvdf              0.01     0.58  237.41   52.50    12.90     6.21   
135.02     2.32    8.01    3.65   27.72   0.57  16.63
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     7.50   
 0.00    2.50     0.00     0.04    32.00     0.00    1.60    0.00    1.60   
1.60   0.40xvdf              0.00     0.00  353.50    0.00    24.12     0.00   
139.75     0.49    1.37    1.37    0.00   0.58  20.60
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00   
 0.00    1.00     0.00     0.00     8.00     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     2.00  463.50   35.00    30.69     2.86   
137.84     0.88    1.77    1.29    8.17   0.60  30.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00   
 0.00    1.00     0.00     0.00     8.00     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     0.00   99.50   36.00     8.54     4.40   
195.62     1.55    3.88    1.45   10.61   1.06  14.40
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     5.00   
 0.00    1.50     0.00     0.03    34.67     0.00    1.33    0.00    1.33   
1.33   0.20xvdf              0.00     1.50  703.00  195.00    48.83    23.76   
165.57     6.49    8.36    1.66   32.51   0.55  49.80
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00   
 0.00    1.00     0.00     0.04    72.00     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     2.50  149.50   69.50    10.12     6.68   
157.14     0.74    3.42    1.18    8.23   0.51  11.20
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     5.00   
 0.00    2.50     0.00     0.03    24.00     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     0.00   61.50   22.50     5.36     2.75   
197.64     0.33    3.93    1.50   10.58   0.88   7.40
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     0.00   
 0.00    0.50     0.00     0.00     8.00     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     0.00  375.00    0.00    24.84     0.00   
135.64     0.45    1.20    1.20    0.00   0.57  21.20
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     1.00   
 0.00    6.00     0.00     0.03     9.33     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     0.00  542.50   23.50    35.08     2.83   
137.16     0.80    1.41    1.15    7.23   0.49  28.00
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %utilxvda              0.00     3.50   
 0.50    1.50     0.00     0.02    24.00     0.00    0.00    0.00    0.00   
0.00   0.00xvdf              0.00     1.50  272.00  153.50    16.18    18.67   
167.73    14.32   33.66    1.39   90.84   0.81  34.60


On Tue, Jul 12, 2016 at 12:34 PM, Jonathan Haddad  wrote:

When you have high system load it means your CPU is waiting for *something*, 
and in my experience it's usually slow disk.  A disk connected over network has 
been a culprit for me many times.
On Tue, Jul 12, 2016 at 12:33 PM Jonathan Haddad  wrote:

Can do you do:
iostat -dmx 2 10 


On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang  wrote:

Hi Jeff,
The read being low is because we do not

Re: NoHostAvailableException coming up on our server

2016-07-13 Thread Romain Hardouin
Put the driver logs in debug mode to see what's happen.Btw I am surprised by 
the few requests by connections in your setup:
.setConnectionsPerHost(HostDistance.LOCAL, 20, 20)
 .setMaxRequestsPerConnection(HostDistance.LOCAL, 128) It looks like a 
protocol v2 settings (Cassandra 2.0) because it was limited to 128 requests per 
connection. You're using C* 3.3 so the protocol v4.You can go up to 32K since 
protocol v3. As a first step I would try to open only 2 connections with 16K in 
MaxRequestsPerConnection. Then try to fine tune.
Best,
Romain

Le Mardi 12 juillet 2016 23h57, Abhinav Solan  a 
écrit :
 

 I am using 3.0.0 version over apache-cassandra-3.3 
On Tue, Jul 12, 2016 at 2:37 PM Riccardo Ferrari  wrote:

What driver version are you using?
You can look at the LoggingRetryPolicy to have more meaningful messages in your 
logs.
best,
On Tue, Jul 12, 2016 at 9:02 PM, Abhinav Solan  wrote:

Thanks, JohnnyActually, they were running .. it went through a series of read 
and writes .. and recovered after the error.Is there any settings I can specify 
in preparing the Session at java client driver level, here are my current 
settings - PoolingOptions poolingOptions = new PoolingOptions()
 .setConnectionsPerHost(HostDistance.LOCAL, 20, 20)
 .setMaxRequestsPerConnection(HostDistance.LOCAL, 128)
 .setNewConnectionThreshold(HostDistance.LOCAL, 100);

 Cluster.Builder builder = Cluster.builder()
 .addContactPoints(cp)
 .withPoolingOptions(poolingOptions)
 .withProtocolVersion(ProtocolVersion.NEWEST_SUPPORTED)
 .withPort(port);

On Tue, Jul 12, 2016 at 11:47 AM Johnny Miller  
wrote:

Abhinav - your getting that as the driver isn’t finding any hosts up for your 
query. You probably need to check if all the nodes in your cluster are running.
See: 
http://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/exceptions/NoHostAvailableException.html

Johnny

On 12 Jul 2016, at 18:46, Abhinav Solan  wrote:
Hi Everyone,
I am getting this error on our server, it comes and goes seems the connection 
drops a comes back after a while -Caused by: 
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: :9042 
(com.datastax.driver.core.exceptions.ConnectionException: [] 
Pool is CLOSING))
at 
com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:218)
at 
com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43)
at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:284)
at 
com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:115)
at 
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:91)
at 
com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:129)Can
 anyone suggest me what can be done to handle this error ? 
Thanks,Abhinav







  

Re: Changing a cluster name

2016-06-30 Thread Romain Hardouin
Indeed when you want to flush the system keyspace you need to specify it. The 
flush without argument filters out the system keyspace. This behavior is still 
the same in the trunk. If you dig into the sources, look at 
"nodeProbe.getNonSystemKeyspaces()" when "cmdArgs" is empty:- 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeTool.java#L329-
 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeTool.java#L337
Best,
Romain 

Le Mercredi 29 juin 2016 19h03, Paul Fife  a écrit :
 

 Thanks Dominik - I was doing a nodetool flush like the instructions said, but 
it wasn't actually flushing the system keyspace. Using nodetool flush system 
made it work as expected!
Thanks,Paul Fife
On Wed, Jun 29, 2016 at 7:37 AM, Dominik Keil  
wrote:

  Also you might want to explicitly do "nodetool flush system". I've recently 
done this in C* 2.2.6 and just "nodetool flush" would not have flushed the 
system keyspace, leading to the change in cluster name not being persisted 
across restarts.
 
 Cheers
 
 Am 29.06.2016 um 03:36 schrieb Surbhi Gupta:
  
 system.local uses local strategy . You need to update on all nodes . 
   
 On 28 June 2016 at 14:51, Tyler Hobbs  wrote:
 
  First, make sure that you call nodetool flush after modifying the system 
table.  That's probably why it's not surviving the restart.
 
  Second, I believe you will have to do this across all nodes and restart them 
at the same time.  Otherwise, cluster name mismatches will prevent the nodes 
from communicating with each other.

 On Fri, Jun 24, 2016 at 3:51 PM, Paul Fife  wrote:
 
 I am following the instructions here to attempt to change the name of a 
cluster: https://wiki.apache.org/cassandra/FAQ#clustername_mismatch or at least 
the more up to date advice: 
http://stackoverflow.com/questions/22006887/cassandra-saved-cluster-name-test-cluster-configured-name
 
  I am able to query the system.local to verify the clusterName is modified, 
but when I restart Cassandra it fails, and the value is back at the original 
cluster name. Is this still possible, or are there changes preventing this from 
working anymore?  
  I have attempted this several times and it did actually work the first time, 
but when I moved around to the other nodes it no longer worked. 
  Thanks, Paul Fife 
   
  
 
 
   -- 
 Tyler Hobbs
 DataStax
   
  
  
 
 -- 
  Dominik Keil
 Phone: + 49 (0) 621 150 207 31
 Mobile: + 49 (0) 151 626 602 14
 
 Movilizer GmbH
 Konrad-Zuse-Ring 30
 68163 Mannheim
 Germany
   
movilizer.com

Reinvent Your Mobile Enterprise
-
Movilizer is moving
After June 27th 2016 Movilizer's new headquarter will be

EASTSITE VIII
Konrad-Zuse-Ring 30
68163 Mannheim


Be the first to know:Twitter | LinkedIn | Facebook | stack overflow
Company's registered office: Mannheim HRB: 700323 / Country Court: Mannheim 
Managing Directors: Alberto Zamora, Jörg Bernauer, Oliver Lesche Please inform 
us immediately if this e-mail and/or any attachment was transmitted 
incompletely or was not intelligible.

This e-mail and any attachment is for authorized use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender.



  

Re: Lightweight Transactions during datacenter outage

2016-06-08 Thread Romain Hardouin
> Would you know why the driver doesn't automatically change to LOCAL_SERIAL 
> during a DC outage ?
I would say because *you* decide, not the driver ;-) This kind of fallback 
could be achieved with a custom downgrading policy 
(DowngradingConsistencyRetryPolicy [*] doesn't handle ConsistencyLevel.SERIAL / 
LOCAL_SERIAL )
* 
https://github.com/datastax/python-driver/blob/2.7.2-cassandra-2.1/cassandra/policies.py#L747
Best,
Romain
 

Le Mercredi 8 juin 2016 15h41, Jeronimo de A. Barros 
 a écrit :
 

 Tyler,
Thank you, it's working now:
self.query['online'] = SimpleStatement("UPDATE table USING ttl %s SET l = True 
WHERE k2 = %s IF l = False;", consistency_level=ConsistencyLevel.LOCAL_QUORUM, 
serial_consistency_level=ConsistencyLevel.LOCAL_SERIAL) 
Would you know why the driver doesn't automatically change to LOCAL_SERIAL 
during a DC outage ? Or the driver already has an option to make this change 
from SERIAL to LOCAL_SERIAL ?
Again, thank you very much, the bill for the beers is on me in September during 
the Cassandra Summit. ;-)
Best regards, Jero

On Tue, Jun 7, 2016 at 6:39 PM, Tyler Hobbs  wrote:

You can set the serial_consistency_level to LOCAL_SERIAL to tolerate a DC 
failure: 
http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level.
  It defaults to SERIAL, which ignores DCs.

On Tue, Jun 7, 2016 at 12:26 PM, Jeronimo de A. Barros 
 wrote:

Hi,
I have a cluster spreaded among 2 datacenters (DC1 and DC2), two server on each 
DC and I have a keyspace with NetworkTopologyStrategy (DC1:2 and DC2:2) with 
the following table:
CREATE TABLE test (  k1 int,  k2 timeuuid,  PRIMARY KEY ((k1), k2)) WITH 
CLUSTERING ORDER BY (k2 DESC)
During a datacenter outage, as soon as a datacenter goes offline, I get this 
error during a lightweight transaction:
cqlsh:devtest> insert into test (k1,k2) values(1,now()) if not exists;Request 
did not complete within rpc_timeout.                                          
And a short time after the on-line DC verify the second DC is off-line:
cqlsh:devtest> insert into test (k1,k2) values(1,now()) if not exists;Unable to 
complete request: one or more nodes were unavailable.                       
So, my question is: Is there any way to keep lightweight transactions working 
during a datacenter outage using the C* Python driver 2.7.2 ?
I was thinking about catch the exception and do a simple insert (without "IF") 
when the error occur, but having the lightweight transactions working even 
during a DC outage/split would be nice.
Thanks in advance for any help/hints.
Best regards, Jero



-- 
Tyler Hobbs
DataStax




  

Re: Nodetool repair inconsistencies

2016-06-08 Thread Romain Hardouin
Hi Jason,
It's difficult for the community to help you if you don't share the error 
;-)What the logs said when you ran a major compaction? (i.e. the first error 
you encountered) 
Best,
Romain

Le Mercredi 8 juin 2016 3h34, Jason Kania  a écrit :
 

 I am running a 3 node cluster of 3.0.6 instances and encountered an error when 
running nodetool compact. I then ran nodetool repair. No errors were returned.
I then attempted to run nodetool compact again, but received the same error so 
the repair made no correction and reported no errors.
After that, I moved the problematic files out of the directory, restarted 
cassandra and attempted the repair again. The repair again completed without 
errors, however, no files were added to the directory that had contained the 
corrupt files. So nodetool repair does not seem to be making actual repairs.
I started looking around and numerous directories have vastly different amounts 
of content across the 3 nodes. There are 3 replicas so I would expect to find 
similar amounts of content in the same data directory on the different nodes.

Is there any way to dig deeper into this? I don't want to be caught because 
replication/repair is silently failing. I noticed that there is always an "some 
repair failed" amongst the repair output but that is so completely unhelpful 
and has always been present.

Thanks,
Jason


  

Re: How to remove 'compact storage' attribute?

2016-06-08 Thread Romain Hardouin
 
Hi,
You can't yet, see https://issues.apache.org/jira/browse/CASSANDRA-10857Note 
that secondary indexes don't scale. Be aware of their limitations.If you want 
to change the data model of a CF, a Spark job can do the trick.
Best,
Romain   

 Le Mardi 7 juin 2016 10h51, "Lu, Boying"  a écrit :
 

  #yiv3185006454 #yiv3185006454 -- filtered {font-family:SimSun;panose-1:2 1 6 
0 3 1 1 1 1 1;}#yiv3185006454 filtered {font-family:SimSun;panose-1:2 1 6 0 3 1 
1 1 1 1;}#yiv3185006454 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 
2 4;}#yiv3185006454 filtered {font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 
1;}#yiv3185006454 p.yiv3185006454MsoNormal, #yiv3185006454 
li.yiv3185006454MsoNormal, #yiv3185006454 div.yiv3185006454MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;}#yiv3185006454 a:link, 
#yiv3185006454 span.yiv3185006454MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv3185006454 a:visited, #yiv3185006454 
span.yiv3185006454MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv3185006454 
span.yiv3185006454EmailStyle17 {color:windowtext;}#yiv3185006454 
.yiv3185006454MsoChpDefault {}#yiv3185006454 filtered {margin:72.0pt 90.0pt 
72.0pt 90.0pt;}#yiv3185006454 div.yiv3185006454WordSection1 {}#yiv3185006454 
Hi, All,    Since the Astyanax client has been EOL, we are considering to 
migrate to Datastax java client in our product.    One thing I notice is that 
the CFs created  by Astyanax have ‘compact storage’ attribute which prevent us 
from using some new features provided by CQL such as secondary index.    Does 
anyone know how to remove this attribute? “ALTER TABLE” seems doesn’t work 
according to the CQL document.    Thanks     Boying    

  

Re: Inconsistent Reads after Restoring Snapshot

2016-04-26 Thread Romain Hardouin
Yes the "Node restart method" with -Dcassandra.join_ring=false. Note that they 
advise to make a repair anyway. But thanks to join_ring=false the node will 
hibernate and not serve stale data.Tell me if I'm wrong you assume that server 
A is still OK, therefore system keyspace still exist? If not (disk KO) it's not 
the same procedure (hence the tokens in cassandra.yaml that I mentioned). 
Actually I'm not sure of what you assume by "node A crashes". You should try on 
a test cluster or with CCM (https://github.com/pcmanus/ccm) in order to 
familiarize yourself with the procedure.
Romain 

Le Mardi 26 avril 2016 11h02, Anuj Wadehra  a écrit 
:
 

 Thanks Romain !! So just to clarify, you are suggesting following steps:

10 AM Daily Snapshot taken of node A and moved to backup location
11 AM A record is inserted such that node A and B insert the record but there 
is a mutation drop on node C.
1  PM Node A crashes 
1  PM Follow following steps to restore the 10 AM snapshot on node A:
          1. Restore the data as mentioned in 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
 
          with ONE EXCEPTION >> start node A with  -Dcassandra.join_ring=false 
. 
          2. Run repair
          3. Retstart the node A with -Dcassandra.join_ring=true

Please confirm.

I was not aware that join_ring can also be used using a normal reboot. I 
thought it was only an option during autobootstrap :)

Thanks
Anuj



----
On Tue, 26/4/16, Romain Hardouin  wrote:

 Subject: Re: Inconsistent Reads after Restoring Snapshot
 To: "user@cassandra.apache.org" 
 Date: Tuesday, 26 April, 2016, 12:47 PM
 
 You can make a restore on the new node A (don't
 forget to set the token(s) in cassandra.yaml), start the
 node with -Dcassandra.join_ring=false and then run a repair
 on it. Have a look at https://issues.apache.org/jira/browse/CASSANDRA-6961
 Best,
 Romain 
 
    Le Mardi 26 avril
 2016 4h26, Anuj Wadehra  a
 écrit :
  
 
  Hi,
 We
 have 2.0.14. We use RF=3 and read/write at Quorum. Moreover,
 we dont use incremental backups. As per the documentation
 at  ,
 if i need to restore a Snapshot on SINGLE node in a cluster,
 I would run repair at the end. But while the repair is going
 on, reads may get inconsistent.
 
 Consider
 following scenario:10
 AM Daily Snapshot taken of node A and moved to backup
 location11
 AM A record is inserted such that node A and B insert the
 record but there is a mutation drop on node C.1
 PM Node A crashes and data is restored from latest 10 AM
 snapshot. Now, only Node B has the record.
 Now,
 my question is:
 Till
 the repair is completed on node A,a read at Quorum may
 return inconsistent result based on the nodes from which
 data is read.If data is read from node A and node C, nothing
 is returned and if data is read from node A and node B,
 record is returned. This is a vital point which is not
 highlighted anywhere.
 
 Please
 confirm my understanding.If my understanding is right, how
 to make sure that my reads are not inconsistent while a node
 is being repair after restoring a snapshot.
 I
 think, autobootstrapping the node without joining the ring
 till the repair is completed, is an alternative option. But
 snapshots save lot of streaming as compared to
 bootstrap.
 Will
 incremental backups guarantee that 
 ThanksAnuj
 
 Sent
 from Yahoo Mail on Android
 
    

  

Re: Inconsistent Reads after Restoring Snapshot

2016-04-26 Thread Romain Hardouin
You can make a restore on the new node A (don't forget to set the token(s) in 
cassandra.yaml), start the node with -Dcassandra.join_ring=false and then run a 
repair on it. Have a look at 
https://issues.apache.org/jira/browse/CASSANDRA-6961
Best,
Romain 

Le Mardi 26 avril 2016 4h26, Anuj Wadehra  a écrit :
 

 Hi,
We have 2.0.14. We use RF=3 and read/write at Quorum. Moreover, we dont use 
incremental backups. As per the documentation at 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_backup_snapshot_restore_t.html
 , if i need to restore a Snapshot on SINGLE node in a cluster, I would run 
repair at the end. But while the repair is going on, reads may get inconsistent.

Consider following scenario:10 AM Daily Snapshot taken of node A and moved to 
backup location11 AM A record is inserted such that node A and B insert the 
record but there is a mutation drop on node C.1 PM Node A crashes and data is 
restored from latest 10 AM snapshot. Now, only Node B has the record.
Now, my question is:
Till the repair is completed on node A,a read at Quorum may return inconsistent 
result based on the nodes from which data is read.If data is read from node A 
and node C, nothing is returned and if data is read from node A and node B, 
record is returned. This is a vital point which is not highlighted anywhere.

Please confirm my understanding.If my understanding is right, how to make sure 
that my reads are not inconsistent while a node is being repair after restoring 
a snapshot.
I think, autobootstrapping the node without joining the ring till the repair is 
completed, is an alternative option. But snapshots save lot of streaming as 
compared to bootstrap.
Will incremental backups guarantee that 
ThanksAnuj

Sent from Yahoo Mail on Android

  

Re: Ops Centre Read Requests / TBL: Local Read Requests

2016-02-16 Thread Romain Hardouin
Yes you are right Anishek. If you write with LOCAL_ONE, values will be the same.


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Romain Hardouin
Would you mind pasting the ouput for both nodes in gist/paste/whatever? 
https://gist.github.com http://paste.debian.net



Le Jeudi 11 février 2016 11h57, kedar  a écrit :
Thanks for the reply.

ls -l cassandra/data/* lists various *.db files

This problem is on both nodes.

Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in


Re: Keyspaces not found in cqlsh

2016-02-11 Thread Romain Hardouin
What is the output on both nodes of the following command? 
ls -l /var/lib/cassandra/data/system/* 
If one node seems odd you can try "nodetool resetlocalschema" but the other 
node must be in clean state.

Best,
Romain


Le Jeudi 11 février 2016 11h10, kedar  a écrit :
I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see 
/ desc keyspaces and query tables through cqlsh on either of the two nodes

cqlsh> desc keyspaces



cqlsh> use user_index;
cqlsh:user_index> desc table list_1_10;

Keyspace 'user_index' not found.
cqlsh:user_index>
cqlsh>  select * from system.schema_keyspaces;
Keyspace 'system' not found.
cqlsh>
We are running a 2 node cluster. The Python - Django app that inserts 
data is running without any failure and system logs show nothing abnormal.

./nodetool repair on one node hasn't helped ./nodetool cfstats shows all 
the tables too

-- 
Thanks,
Kedar Parikh

Ext : 2224
Dir : +91 22 61782224
Mob : +91 9819634734
Email : kedar.par...@netcore.co.in
Web : www.netcore.co.in


Re: reducing disk space consumption

2016-02-11 Thread Romain Hardouin
As Mohammed said "nodetool clearsnaphost" will do the trick.
Cassandra takes a snapshot by default before keyspace/table dropping or 
truncation.
You can disable this feature if it's a dev node (see auto_snapshot in 
cassandra.yaml) but if it's a production node is a good thing to keep auto 
snapshots.

Best,

Romain


Re: missing rows while importing data using sstable loader

2016-02-05 Thread Romain Hardouin
> What is the best practise to create sstables?

When you run a "nodetool flush" Cassandra persists all the memtables on disk, 
i.e. it produces sstables.
(You can create sstables by yourself thanks to  CQLSSTableWriter, but I don't 
think it was the point of your question.)


Re: missing rows while importing data using sstable loader

2016-02-01 Thread Romain Hardouin
Did you run "nodetool flush" on the source node? If not, the missing rows could 
be in memtables.


Re: missing rows while importing data using sstable loader

2016-01-29 Thread Romain Hardouin
Hi,
I assume a RF > 1. Right?What is the consistency level you used? cqlsh use ONE 
by default. Try: cqlsh> CONSISTENCY ALLAnd run your query again.
Best,Romain 

Le Vendredi 29 janvier 2016 13h45, Arindam Choudhury 
 a écrit :
 

 Hi Kai,

The table schema is:

CREATE TABLE mordor.things_values_meta (
    thing_id text,
    key text,
    bucket_timestamp timestamp,
    total_rows counter,
    PRIMARY KEY ((thing_id, key), bucket_timestamp)
) WITH CLUSTERING ORDER BY (bucket_timestamp ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
    AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';


I am just running "select count(*) from things_values_meta ;" to get the count.

Regards,
Arindam

On 29 January 2016 at 13:39, Kai Wang  wrote:

Arindam,

what's the table schema and what does your query to retrieve the rows look like?
 
On Fri, Jan 29, 2016 at 7:33 AM, Arindam Choudhury 
 wrote:

Hi,

I am importing data to a new cassandra cluster using sstableloader. The 
sstableloader runs without any warning or error. But I am missing around 1000 
rows.

Any feedback will be highly appreciated. 

Kind Regards,
Arindam Choudhury






  

Re: About cassandra's reblance when adding one or more nodes into the existed cluster?

2016-01-26 Thread Romain Hardouin


Hi Dillon, 
CMIIW I suspect that you use vnodes and you want to "move one of the 256 tokens 
to another node". If yes, that's not possible."nodetool move" is not allowed 
with vnodes: 
https://github.com/apache/cassandra/blob/cassandra-2.1.11/src/java/org/apache/cassandra/service/StorageService.java#L3488
*But* if you try "nodetool move" with a token that is already owned by a node, 
the check is done *before* the vnodes check: 
https://github.com/apache/cassandra/blob/cassandra-2.1.11/src/java/org/apache/cassandra/service/StorageService.java#L3479
If you use single token, it seems you try to replace a node by another 
one...Maybe you could explain what is the problem that leads you to do a 
nodetool move? (along with the nodetool ring output as Alain suggested)
Best,Romain

Re: Strategy tools for taking snapshots to load in another cluster instance

2015-11-25 Thread Romain Hardouin
My previous answer (sstableloader) allows you moving from larger to smaller 
cluster

Sent from Yahoo Mail on Android 
 
  On Tue, Nov 24, 2015 at 11:30, Anishek Agarwal wrote:   
Peer, 
that talks about having a similar sized cluster, I was wondering if there is a 
way for moving from larger to smaller cluster. I will try a few things as soon 
as i get time and update here.
On Thu, Nov 19, 2015 at 5:48 PM, Peer, Oded  wrote:


Have you read the DataStax documentation?

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html

 

 

From: Romain Hardouin [mailto:romainh...@yahoo.fr]
Sent: Wednesday, November 18, 2015 3:59 PM
To: user@cassandra.apache.org
Subject: Re: Strategy tools for taking snapshots to load in another cluster 
instance

 

| 
You can take a snapshot via nodetool then load sstables on your test cluster 
with sstableloader: 
docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html





Sent from Yahoo Mail on Android

| 
From:"Anishek Agarwal" 
Date:Wed, Nov 18, 2015 at 11:24
Subject:Strategy tools for taking snapshots to load in another cluster instance

Hello

 

We have 5 node prod cluster and 3 node test cluster. Is there a way i can take 
snapshot of a table in prod and load it test cluster. The cassandra versions 
are same. 

 

Even if there is a tool that can help with this it will be great.

 

If not, how do people handle scenarios where data in prod is required in 
staging/test clusters for testing to make sure things are correct ? Does the 
cluster size have to be same to allow copying of relevant snapshot data etc? 

 

 

thanks

anishek
 |

 |


 


  


Re: Strategy tools for taking snapshots to load in another cluster instance

2015-11-18 Thread Romain Hardouin
You can take a snapshot via nodetool then load sstables on your test cluster 
with sstableloader: 
docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html


Sent from Yahoo Mail on Android

From:"Anishek Agarwal" 
Date:Wed, Nov 18, 2015 at 11:24
Subject:Strategy tools for taking snapshots to load in another cluster instance

Hello


We have 5 node prod cluster and 3 node test cluster. Is there a way i can take 
snapshot of a table in prod and load it test cluster. The cassandra versions 
are same. 


Even if there is a tool that can help with this it will be great.


If not, how do people handle scenarios where data in prod is required in 
staging/test clusters for testing to make sure things are correct ? Does the 
cluster size have to be same to allow copying of relevant snapshot data etc? 

 


thanks

anishek



Re: keyspace with hundreds of columnfamilies

2014-07-04 Thread Romain HARDOUIN
Cassandra can handle many more columns (e.g. time series).
So 100 columns is OK.

Best,
Romain



tommaso barbugli  a écrit sur 03/07/2014 21:55:18 :

> De : tommaso barbugli 
> A : user@cassandra.apache.org, 
> Date : 03/07/2014 21:55
> Objet : Re: keyspace with hundreds of columnfamilies
> 
> thank you for the replies; I am rethinking the schema design, one 
> possible solution is to "implode" one dimension and get N times less 
CFs.
> With this approach I would come up with (cql) tables with up to 100 
> columns; would that be a problem?
> 
> Thank You,
> Tommaso
> 


Re: keyspace with hundreds of columnfamilies

2014-07-02 Thread Romain HARDOUIN
Arena allocation is an improvement feature, not a limitation. 
It was introduced in Cassandra 1.0 in order to lower memory fragmentation 
(and therefore promotion failure).
AFAIK It's not intended to be tweaked so it might not be a good idea to 
change it.

Best,
Romain

tommaso barbugli  a écrit sur 02/07/2014 17:40:18 :

> De : tommaso barbugli 
> A : user@cassandra.apache.org, 
> Date : 02/07/2014 17:40
> Objet : Re: keyspace with hundreds of columnfamilies
> 
> 1MB per column family sounds pretty bad to me; is this something I 
> can tweak/workaround somehow?
> 
> Thanks
> Tommaso
> 

> 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN :
> The trap is that each CF will consume 1 MB of memory due to arena 
allocation. 
> This might seem harmless but if you plan thousands of CF it means 
> thousands of mega bytes... 
> Up to 1,000 CF I think it could be doable, but not 10,000. 
> 
> Best, 
> 
> Romain 
> 
> 
> tommaso barbugli  a écrit sur 02/07/2014 10:13:41 :
> 
> > De : tommaso barbugli  
> > A : user@cassandra.apache.org, 
> > Date : 02/07/2014 10:14 
> > Objet : keyspace with hundreds of columnfamilies 
> > 
> > Hi, 
> > Are there any known issues, shortcomings about organising data in 
> > hundreds of column families? 
> > At this present I am running with 300 column families but I expect 
> > that to get to a couple of thousands. 
> > Is this something discouraged / unsupported (I am using Cassandra 
2.0). 
> > 
> > Thanks 
> > Tommaso

RE: keyspace with hundreds of columnfamilies

2014-07-02 Thread Romain HARDOUIN
The trap is that each CF will consume 1 MB of memory due to arena 
allocation. 
This might seem harmless but if you plan thousands of CF it means 
thousands of mega bytes...
Up to 1,000 CF I think it could be doable, but not 10,000.

Best,

Romain


tommaso barbugli  a écrit sur 02/07/2014 10:13:41 :

> De : tommaso barbugli 
> A : user@cassandra.apache.org, 
> Date : 02/07/2014 10:14
> Objet : keyspace with hundreds of columnfamilies
> 
> Hi,
> Are there any known issues, shortcomings about organising data in 
> hundreds of column families?
> At this present I am running with 300 column families but I expect 
> that to get to a couple of thousands.
> Is this something discouraged / unsupported (I am using Cassandra 2.0).
> 
> Thanks
> Tommaso

RE: Backup Cassandra to

2014-06-12 Thread Romain HARDOUIN
So you have to install a backup client on each Cassandra node. If the 
NetBackup client behaves like EMC Networker, beware the resources 
utilization (data deduplication, compression). You could have to boost 
CPUs and RAM (+2GB) of each nodes.

Try with one node: make a snapshot with nodetool and configure NetBackup 
so the backup client sends data to your tape library or virtual tape 
library. And of course, try to restore ;-)
The tricky part is not Cassandra itself, it's to follow NetBackup (or 
whatever) best practices.


"Camacho, Maria (NSN - FI/Espoo)"  a écrit sur 
12/06/2014 13:12:18 :

> De : "Camacho, Maria (NSN - FI/Espoo)" 
> A : "user@cassandra.apache.org" , 
> Date : 12/06/2014 13:12
> Objet : RE: Backup Cassandra to
> 
> Hi,
> Thanks for the quick response Romain.
> 
> We would like to avoid using extra disk space, so no DAS/SAN.
> We are more interested in achieving something like what is now being
> done with Oracle – Symantec’s NetBackup is used to backup directly 
> to tape, no intermediate storage is needed. 
> 
> It could be NetBackup or whatever product supported by Cassandra 
> that writes the backup on tape without storing it on disk first. 
> 
> Regards,
> Maria


RE: Backup Cassandra to

2014-06-12 Thread Romain HARDOUIN
Hi Maria,

It depends which backup software and hardware you plan to use. Do you 
store your data on DAS or SAN?
Some hints regarding Cassandra is either to drain the node to backup or 
take a Cassandra snapshot and then to backup this snapshot.
We backup our data on tape but we also store our data on SAN, so it's 
pretty vendor specific.

Best,

Romain


"Camacho, Maria (NSN - FI/Espoo)"  a écrit sur 
12/06/2014 10:57:06 :

> De : "Camacho, Maria (NSN - FI/Espoo)" 
> A : "user@cassandra.apache.org" , 
> Date : 12/06/2014 10:57
> Objet : Backup Cassandra to
> 
> Hi there,
> 
> I'm trying to find information/instructions about backing up and 
> restoring a Cassandra DB to and from a tape unit.
> 
> I was hopping someone in this forum could help me with this since I 
> could not find anything useful in Google :(
> 
> Thanks in advance,
> Maria
> 

RE: Memory issue

2014-05-20 Thread Romain HARDOUIN
Well... you have already changed the limits ;-)
Keep in mind that changes in the limits.conf file will not affect 
processes that are already running.

opensaf dev  a écrit sur 21/05/2014 06:59:05 :

> De : opensaf dev 
> A : user@cassandra.apache.org, 
> Date : 21/05/2014 07:00
> Objet : Memory issue
> 
> Hi guys,
> 
> I am trying to run Cassandra on CentOS as an user X other then root 
> or cassandra. When I run as user cassandra, it starts and runs fine.
> But, when I run under user X, I am getting the below error once 
> cassandra started and system freezes totally.
> 
> Insufficient memlock settings:
> WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable 
> to lock JVM memory (ENOMEM).
> This can result in part of the JVM being swapped out, especially 
> with mmapped I/O enabled.
> Increase RLIMIT_MEMLOCK or run Cassandra as root.
> 
> 
> I have tried the tips available online to change the memlock and 
> other limits both for users cassadra and X, but did not solve the 
problem.
> 

> What else I should consider when I run cassandra other then user 
> cassandra/root.
> 
> 
> Any help is much appreciated.
> 
> 
> Thanks
> Dev
> 


RE: Memory issue

2014-05-20 Thread Romain HARDOUIN
Hi,

You have to define limits for the user. 
Here is an example for the user cassandra:

# cat /etc/security/limits.d/cassandra.conf 
cassandra   -   memlock unlimited
cassandra   -   nofile  10

best,

Romain

opensaf dev  a écrit sur 21/05/2014 06:59:05 :

> De : opensaf dev 
> A : user@cassandra.apache.org, 
> Date : 21/05/2014 07:00
> Objet : Memory issue
> 
> Hi guys,
> 
> I am trying to run Cassandra on CentOS as an user X other then root 
> or cassandra. When I run as user cassandra, it starts and runs fine.
> But, when I run under user X, I am getting the below error once 
> cassandra started and system freezes totally.
> 
> Insufficient memlock settings:
> WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable 
> to lock JVM memory (ENOMEM).
> This can result in part of the JVM being swapped out, especially 
> with mmapped I/O enabled.
> Increase RLIMIT_MEMLOCK or run Cassandra as root.
> 
> 
> I have tried the tips available online to change the memlock and 
> other limits both for users cassadra and X, but did not solve the 
problem.
> 

> What else I should consider when I run cassandra other then user 
> cassandra/root.
> 
> 
> Any help is much appreciated.
> 
> 
> Thanks
> Dev
> 


RE: Datacenter understanding question

2014-05-13 Thread Romain HARDOUIN
RF=1 means no replication
You have to set RF=2 in order to set up a mirroring

-Romain

ng  a écrit sur 13/05/2014 19:37:08 :

> De : ng 
> A : "user@cassandra.apache.org" , 
> Date : 14/05/2014 04:37
> Objet : Datacenter understanding question
> 
> If I have configuration of two data center with one node each.
> Replication factor is also 1.
> Will these 2 nodes going to be mirrored/replicated?

RE: Cassandra Disk storage capacity

2014-04-07 Thread Romain HARDOUIN
Hi,

See data_file_directories and commitlog_directory in the settings file 
cassandra.yaml.

Cheers,

Romain

Hari Rajendhran  a écrit sur 07/04/2014 12:56:37 
:

> De : Hari Rajendhran 
> A : user@cassandra.apache.org, 
> Date : 07/04/2014 12:58
> Objet : Cassandra Disk storage capacity
> 
> Hi Team,
> 
> We have a 3 node Apache cassandra 2.0.4 setup installed in our lab 
> setup.We have set data directory to /var/lib/cassandra/data.What 
> would be the maximum 
> disk storage that will be used for cassandra data storage.
> 
> Note : /var partition has a storage capacity of 40GB.
> 
> My question is whether cassandra will  the entire / directory for 
> data storage ?
> If no, how to specify multiple directories for data storage ??
> 
> 
> 
> 
> 
> Best Regards
> Hari Krishnan Rajendhran
> Hadoop Admin
> DESS-ABIM ,Chennai BIGDATA Galaxy
> Tata Consultancy Services
> Cell:- 9677985515
> Mailto: hari.rajendh...@tcs.com
> Website: http://www.tcs.com
> 
> Experience certainty. IT Services
> Business Solutions
> Consulting
> 
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you

RE: Question about rpms from datastax

2014-03-28 Thread Romain HARDOUIN
cassandra*.noarch.rpm -> Install Cassandra Only
dsc*.noarch.rpm -> DSC stands for DataStax Community. Install Cassandra + 
OpsCenter

Donald Smith  a écrit sur 27/03/2014 
20:36:57 :

> De : Donald Smith 
> A : "'user@cassandra.apache.org'" , 
> Date : 27/03/2014 20:37
> Objet : Question about rpms from datastax
> 
> On http://rpm.riptano.com/community/noarch/ what’s the difference 
between 
> 
> cassandra20-2.0.6-1.noarch.rpm  and  dsc20-2.0.6-1.noarch.rpm ?
> 
> Thanks, Don
> 
> Donald A. Smith | Senior Software Engineer 
> P: 425.201.3900 x 3866
> C: (206) 819-5965
> F: (646) 443-2333
> dona...@audiencescience.com


RE: [ANN] pithos is cassandra-backed S3 compatible object store

2014-03-27 Thread Romain HARDOUIN
It looks like MagnetoDB for CloudStack.
Nice Clojure project.


Pierre-Yves Ritschard  a écrit sur 27/03/2014 08:12:15 :

> De : Pierre-Yves Ritschard 
> A : user , 
> Date : 27/03/2014 08:12
> Objet : [ANN] pithos is cassandra-backed S3 compatible object store
> 
> Hi,
> 
> If you're already using cassandra for storing your data, you might 
> be interested in http://pithos.io which provides s3 compatibility. 
> The underlying schema splits files in several blocks, themselves 
> being split in chunks. 
> 
> I'm looking forward to all your comments on the schema, code and of 
> course pull-requests :-)
> 
>   - pyr (https://twitter.com/pyr)

Re: Kernel keeps killing cassandra process - OOM

2014-03-24 Thread Romain HARDOUIN
4 GB is OK for a test cluster. 
In the past we encountered a similar issue due to VMWare ESX's memory 
overcommit (memory ballooning).
When you talk about overcommit, you talk about Linux (vm.overcommit_*) or 
hypervisor (like ESX)?



prem yadav  a écrit sur 24/03/2014 12:11:31 :

> De : prem yadav 
> A : user@cassandra.apache.org, 
> Date : 24/03/2014 12:12
> Objet : Re: Kernel keeps killing cassandra process - OOM
> 
> the nodes die without  being under any load. Completely idle.
> And 4 GB system memory is not low.  or is it?
> I have tried tweaking the overcommit memory. Tried disabling it, 
> under-committing and over-committing. 
> I also reduced rpc threads min and max. Will try other setting from 
> that link Michael has given. 


Re: Kernel keeps killing cassandra process - OOM

2014-03-24 Thread Romain HARDOUIN
You have to tune Cassandra in order to run it under a low memory 
environment. 
Many settings must be tuned. The link that Michael mentions provides a 
quick start.

There is a point that I haven't understood. *When* did your nodes die? 
Under load? Or can they be killed via OOM killer even if they are not 
loaded?
If the nodes are VM you have to pay attention to hypervisor memory 
overcommit.


"Laing, Michael"  a écrit sur 22/03/2014 
22:25:30 :

> De : "Laing, Michael" 
> A : user@cassandra.apache.org, 
> Date : 22/03/2014 22:26
> Objet : Re: Kernel keeps killing cassandra process - OOM
> 
> You might want to look at:
> 
> http://www.opensourceconnections.com/2013/08/31/building-the-
> perfect-cassandra-test-environment/

  1   2   >