Re: sstablesplit - status
Hi again, and thanks for the input. It's not tombstoned data I think, but over a really long time very many rows are inserted over and over again - but with some significant pauses between the inserts. I found some examples where a specific row (for example pk=xyz, value=123) exists in more than one or two tables, with exactly the same content but different timestamps. The largest sstables compacted a while ago are now 300-400G in size over some nodes, and it's very unlikely they will be compacted some time soon as there are only one or two sstables of that size on a single node. I think I will try rebootstraping a node to see if that helps. sstablesplit exists in 2.x - but as far as I know is deprecated and in my 3.6 test-cluster it was gone. I was trying sstabledump to have a deeper look - but that says "pre-3.0 SSTabe is not supported" (fair, I am on a 2.2.8 cluster). Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
sstablesplit - status
Hi all, I have some problem with really large sstables which dont get compacted anymore and I know there are many duplicated rows in them. Splitting the tables into smaller ones to get them compacted again would help I thought, so I tried sstablesplit, but: cassandra@cassandra01 /tmp/cassandra $ ./apache-cassandra-3.10/tools/bin/sstablesplit lb-388151-big-Data.db Skipping non sstable file lb-388151-big-Data.db No valid sstables to split cassandra@cassandra01 /tmp/cassandra $ sstablesplit lb-388151-big-Data.db Skipping non sstable file lb-388151-big-Data.db No valid sstables to split It seems that sstablesplit cant handle the "new" filename pattern anymore (acutally running 2.2.8 on those nodes). Any hints or other suggestions to split those sstables or get rid of them? Thanks in advance, Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Read after Write inconsistent at times
Hi, are your nodes at high load? Are there any dropped messages (nodetool tpstats) on any node? Also have a look at your system clocks. C* needs them in thight sync - via ntp for example. Side hint: if you use ntp use the same set of upstreams on all of your nodes - ideal your own one. Using pool.ntp.org might lead to minimal dirfts in time across your cluster. Another thing that could help you out is using client side timestamps: https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/ (of course only when you are using a single client or all clients are in sync via ntp). Am 24.02.2017 um 07:29 schrieb Charulata Sharma (charshar): Hi All, In my application sometimes I cannot read data that just got inserted. This happens very intermittently. Both write and read use LOCAL QUOROM. We have a cluster of 12 nodes which spans across 2 Data Centers and a RF of 3. Has anyone encountered this problem and if yes what steps have you taken to solve it Thanks, Charu -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68 enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Count(*) is not working
Hi, do you got a result finally? Those messages are simply warnings telling you that c* had to read many tombstones while processing your query - rows that are deleted but not garbage collected/compacted. This warning gives you some explanation why things might be much slower than expected because per 100 rows that count c* had to read about 15 times rows that were deleted already. Apart from that, count(*) is almost always slow - and there is a default limit of 10.000 rows in a result. Do you really need the actual live count? To get a idea you can always look at nodetool cfstats (but those numbers also contain deleted rows). Am 16.02.2017 um 13:18 schrieb Selvam Raman: Hi, I want to know the total records count in table. I fired the below query: select count(*) from tablename; and i have got the below output Read 100 live rows and 1423 tombstone cells for query SELECT * FROM keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see tombstone_warn_threshold) Read 100 live rows and 1435 tombstone cells for query SELECT * FROM keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see tombstone_warn_threshold) Read 96 live rows and 1385 tombstone cells for query SELECT * FROM keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see tombstone_warn_threshold). Can you please help me to get the total count of the table. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: Cluster scaling
Hi Branislav, what is it you would expect? Some thoughts: Batches are often misunderstood, they work well only if they contain only one partition key - think of a batch of different sensor data to one key. If you group batches with many partition keys and/or do large batches this puts high load on the coordinator node with then itself needs to talk to the nodes holding the partitions. This could explain the scaling you see in your second try without batches. Keep in mind that the driver supports executeAsync and ResultSetFutures. Second, put commitlog and data directories on seperate disks when using spindles. Third, have you monitored iostats and cpustats while running your tests? Cheers, Jan Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC at Cisco): Hi all, I have a cluster of three nodes and would like to ask some questions about the performance. I wrote a small benchmarking tool in java that mirrors (read, write) operations that we do in the real project. Problem is that it is not scaling like it should. The program runs two tests: one using batch statement and one without using the batch. The operation sequence is: optional select, insert, update, insert. I run the tool on my server with 128 threads (# of threads has no influence on the performance), creating usually 100K resources for testing purposes. The average results (operations per second) with the use of batch statement are: Replication Factor = 1 with readingwithout reading 1-node cluster 37K 46K 2-node cluster 37K 47K 3-node cluster 39K 70K Replication Factor = 2 with readingwithout reading 2-node cluster 21K 40K 3-node cluster 30K 48K The average results (operations per second) without the use of batch statement are: Replication Factor = 1 with readingwithout reading 1-node cluster 31K 20K 2-node cluster 38K 39K 3-node cluster 45K 87K Replication Factor = 2 with readingwithout reading 2-node cluster 19K 22K 3-node cluster 26K 36K The Cassandra VMs specs are: 16 CPUs, 16GB and two 32GB of RAM, at least 30GB of disk space for each node. Non SSD, each VM is on separate physical server. The code is available here https://github.com/bjanosik/CassandraBenchTool.git . It can be built with Maven and then you can use jar in target directory with java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar . Thank you for any help. -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68 enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Hotspots / Load on Cassandra node
Hi, can you check the size of your data directories on that machine to verify in comparison to the others? Have a look for snapshot directories which could still be there from a former table or keyspace. Regards, Jan Am 26. Oktober 2016 06:53:03 MESZ, schrieb Harikrishnan A : >Hello, >When I am issuing nodetool status, I see the load ( in GB) on one of >the node is high compare to the other nodes in my ring. >I do not see any issues with the Data Modeling, and it looks like the >Partition sizes are almost evenly sized and distributed across the >nodes. Repairs are running properly. >How do I approach and fix this issue?. > >Thanks & Regards,Hari -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Re: Thousands of SSTables generated in only one node
Hi Lahiru, 2.1.0 is also quite old (Sep 2014) - and just from my memory I remembered that there was an issue whe had with cold_reads_to_omit: http://grokbase.com/t/cassandra/user/1523sm4y0r/how-to-deal-with-too-many-sstables https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1 That's just a random google hits but maybe that also helps. I ended up with a few thousand sstables smaller than 1MB in size. However I would suggest upgrading to a newer version of cassandra first before diving too deep into this - maybe 2.1.16 or 2.2.8 - as chances are really good your problems will be gone after that. Regards. Jan
Re: Thousands of SSTables generated in only one node
Hi Lahiru, maybe your node was running out of memory before. I saw this behaviour if available heap is low forcing to flush out memtables to sstables quite often. If this is that what is hitting you, you should see that the sstables are really small. To cleanup, nodetool compact would do the job - but if you do not need data from one of the keyspaces at all just drop and recreate it (but look into your data directory if there are snapshots left). Prevent this in future have a close look at heap consumption and maybe give it more memory. HTH, Jan
Re: Combining two clusters/keyspaces into single cluster
Hi, one way I think might work (but not tested in any way by me and there will be some lag / stale data): - create the keyspace2 von cluster1 - use nodetool flush and snapshot on cluster2, remember the timestamp - use sstableloader to write all sstables from cluster2 snapshot to cluster1 - you can repeat last two steps and use sstableload only on tables with mtime > timestamp to add the differencens to cluster1 - shutdown cluster2 when done Of course, data written by old clients to cluster2 wont be available in cluster1 until loading that data into it. Just my 2 cents :) Jan Am 22.04.2016 um 01:15 schrieb Arlington Albertson: Hey Folks, I've been looking through various documentations, but I'm either overlooking something obvious or not wording it correctly, but the gist of my problem is this: I have two cassandra clusters, with two separate keyspaces on EC2. We'll call them as follows: *cluster1* (DC name, cluster name, etc...) *keyspace1* (only exists on cluster1) *cluster2* (DC name, cluster name, etc...) *keyspace2*(only exists on cluster2) I need to perform the following: - take keyspace2, and add it to cluster1 so that all nodes can serve the traffic - needs to happen "live" so that I can repoint new instances to the cluster1 endpoints and they'll just start working, and no longer directly use cluster2 - eventually, tear down cluster2 (easy with a `nodetool decommission` after verifying all seeds have been changed, etc...) This doc seems to be the closest I've found thus far: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html Is that the appropriate guide for this and I'm just over thinking it? Or is there something else I should be looking at? Also, this is DSC C* 2.1.13. TIA! -AA
Re: Fwd: Cassandra Load spike
Hi, you should check the "snapshot" directories on your nodes - it is very likely there are some old ones from failed operations taking up some space. Am 15.04.2016 um 01:28 schrieb kavya: Hi, We are running a 6 node cassandra 2.2.4 cluster and we are seeing a spike in the disk Load as per the ‘nodetool status’ command that does not correspond with the actual disk usage. Load reported by nodetool was as high as 3 times actual disk usage on certain nodes. We noticed that the periodic repair failed with below error on running the command : ’nodetool repair -pr’ ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 RepairRunnable.java:243 - Repair session 64b54d50-0100-11e6-b46e-a511fd37b526 for range (-3814318684016904396,-3810689996127667017] failed with error [….] Validation failed in / org.apache.cassandra.exceptions.RepairException: [….] Validation failed in at org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40 We restarted all nodes in the cluster and ran a full repair which completed successfully without any validation errors, however we still see Load spike on the same nodes after a while. Please advice. Thanks!
Re: Large primary keys
Hi Robert, why do you need the actual text as a key? I sounds a bit unatural at least for me. Keep in mind that you cannot do "like" queries on keys in cassandra. For performance and keeping things more readable I would prefer hashing your text and use the hash as key. You should also take into account to store the keys (hashes) in a seperate table per day / hour or something like that, so you can quickly get all keys for a time range. A query without the partition key may be very slow. Jan Am 11.04.2016 um 23:43 schrieb Robert Wille: I have a need to be able to use the text of a document as the primary key in a table. These texts are usually less than 1K, but can sometimes be 10’s of K’s in size. Would it be better to use a digest of the text as the key? I have a background process that will occasionally need to do a full table scan and retrieve all of the texts, so using the digest doesn’t eliminate the need to store the text. Anyway, is it better to keep primary keys small, or is C* okay with large primary keys? Robert
Re: NTP Synchronization Setup Changes
Hi Mickey, I would strongly suggest to setup a NTP server on your site - this is not really a big deal and with some tutorials on the net done quickly. Then configure your cassandra nodes (and all the rest if you like) to use your ntp instead of public ones. As I have learned the hard way - cassandra is not really happy when nodes have different times in some cases. Benefit of this is, that your nodes will keep time in sync even without connection to the internet. Of course "your time" may drift without a proper timesource or connection but all nodes will have the same drift and so no problems with consistency. If your ntp syncs your nodes will be adjusted smoothly. Pro(?)-solution (what I did before): Attach a gps mouse to your ntp server and use that as time source. So you can have synchronized _and_ accurate time without any connection to public ntp servers as the gps satellites are flying atom clocks :) Just my 2 cents, Jan Von meinem iPhone gesendet > Am 31.03.2016 um 03:07 schrieb Mukil Kesavan : > > Hi, > > We run a 3 server cassandra cluster that is initially NTP synced to a single > physical server over LAN. This server does not have connectivity to the > internet for a few hours to sometimes even days. In this state we perform > some schema operations and reads/writes with QUORUM consistency. > > Later on, the physical server has connectivity to the internet and we > synchronize its time to an external NTP server on the internet. > > Are there any issues if this causes a huge time correction on the cassandra > cluster? I know that NTP gradually corrects the time on all the servers. I > just wanted to understand if there were any corner cases that will cause us > to lose data/schema updates when this happens. In particular, we seem to be > having some issues around missing secondary indices at the moment (not all > but some). > > Also, for our situation where we have to work with cassandra for a while > without internet connectivity, what is the preferred NTP architecture/steps > that have worked for you in the field? > > Thanks, > Micky
Thrift composite partition key to cql migration
Hi, while migrating the reminder of thrift operations in my application I came across a point where I cant find a good hint. In our old code we used a composite with two strings as row / partition key and a similar composite as column key like this: public Composite rowKey() { final Composite composite = new Composite(); composite.addComponent(key1, StringSerializer.get()); composite.addComponent(key2, StringSerializer.get()); return composite; } public Composite columnKey() { final Composite composite = new Composite(); composite.addComponent(key3, StringSerializer.get()); composite.addComponent(key4, StringSerializer.get()); return composite; } In cql this columnfamiliy looks like this: CREATE TABLE foo.bar ( key blob, column1 text, column2 text, value blob, PRIMARY KEY (key, column1, column2) ) For the columns key3 and key4 became column1 and column2 - but the old rowkey is presented as blob (I can put it into a hex editor and see that key1 and key2 values are in there). Any pointers to handle this or is this a known issue? I am using now DataStax Java driver for CQL, old connector used thrift. Is there any way to get key1 and key2 back apart from completly rewriting the table? This is what I had expected it to be: CREATE TABLE foo.bar ( key1 text, key2 text, column1 text, column2 text, value blob, PRIMARY KEY ((key1, key2), column1, column2) ) Cheers, Jan
Re: Cassandra nodes reduce disks per node
Hi Branton, two cents from me - I didnt look through the script, but for the rsyncs I do pretty much the same when moving them. Since they are immutable I do a first sync while everything is up and running to the new location which runs really long. Meanwhile new ones are created and I sync them again online, much less files to copy now. After that I shutdown the node and my last rsync now has to copy only a few files which is quite fast and so the downtime for that node is within minutes. Jan Von meinem iPhone gesendet > Am 18.02.2016 um 22:12 schrieb Branton Davis : > > Alain, thanks for sharing! I'm confused why you do so many repetitive > rsyncs. Just being cautious or is there another reason? Also, why do you > have --delete-before when you're copying data to a temp (assumed empty) > directory? > >> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ wrote: >> I did the process a few weeks ago and ended up writing a runbook and a >> script. I have anonymised and share it fwiw. >> >> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk >> >> It is basic bash. I tried to have the shortest down time possible, making >> this a bit more complex, but it allows you to do a lot in parallel and just >> do a fast operation sequentially, reducing overall operation time. >> >> This worked fine for me, yet I might have make some errors while making it >> configurable though variables. Be sure to be around if you decide to run >> this. Also I automated this more by using knife (Chef), I hate to repeat >> ops, this is something you might want to consider. >> >> Hope this is useful, >> >> C*heers, >> - >> Alain Rodriguez >> France >> >> The Last Pickle >> http://www.thelastpickle.com >> >> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal : >>> Hey Branton, >>> >>> Please do let us know if you face any problems doing this. >>> >>> Thanks >>> anishek >>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis wrote: We're about to do the same thing. It shouldn't be necessary to shut down the entire cluster, right? > On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli > wrote: > > >> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal >> wrote: >> To accomplish this can I just copy the data from disk1 to disk2 with in >> the relevant cassandra home location folders, change the cassanda.yaml >> configuration and restart the node. before starting i will shutdown the >> cluster. > > Yes. > > =Rob >
Re: Forming a cluster of embedded Cassandra instances
Hi, the embedded cassandra to speedup entering the project may will work for developers, we used it for junit. But a simple clone and maven build - I guess it will end in a single node cassandra cluster. Remember cassandra is a distributed database, one will need more than one node to get performance and fault tolerance. Also I would not recommend adding and removing of cluster nodes at high frequency with application start-stop-cycles. To help in getting things up and running, provide a small readme for downloading and starting cassandra. For mac and linux unpacking the tar.gz and running cassandra.sh is not too complicated. Or use a hint to the DataStax Community Edition installers. Apart from installing Java that is a five minute stop to a single node "TestCluster". Configuring a distributed setup is a bit more or a lot more difficult and definitly needs more understanding and planning. Just as a hint and offtopic: I saw people using cassandra as application glue for interprocess communication where every app server started a node (for communication, sessions and as queue and so on). If that is eventually a use case - have a look at hazelcast. Jan Von meinem iPhone gesendet > Am 14.02.2016 um 23:26 schrieb John Sanda : > > The motivation was to make it easy for someone to get up and running quickly > with the project. Clone the git repo, run the maven build, and then you are > all set. It definitely does lower the learning curve for someone just getting > started with a project and who is not really thinking about Cassandra. It > also is convenient for non-devs who need to quickly get the project up and > running. For development, we have people working on Linux, Mac OS X, and > Windows. I am not a Windows user and not even sure if ccm works on Windows, > so ccm can't be the de factor standard for development. > >> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky >> wrote: >> What motivated the use of an embedded instance for development - as opposed >> to simply spawning a process for Cassandra? >> >> >> >> -- Jack Krupansky >> >>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda wrote: >>> The project I work on day to day uses an embedded instance of Cassandra, >>> but it is intended for primarily for development. We embed Cassandra in a >>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I >>> personally do not do this. I use and recommend ccm for development. If you >>> do you WildFly, there is also wildfly-cassandra which deploys Cassandra as >>> a custom WildFly extension. In other words it is deployed in WildFly like >>> other subsystems like EJB, web, etc, not like an application. There isn't a >>> whole lot of active development on this, but it could be another option. >>> >>> For production, we have to support single node clusters (not embedded >>> though), and it has been challenging for pretty much all the reasons you >>> find people saying not to do so. >>> >>> As for failure detection and cluster membership changes, are you using the >>> Datastax driver? You can register an event listener with the driver to >>> receive notifications for those things. >>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad wrote: +1 to what jack said. Don't mess with embedded till you understand the basics of the db. You're not making your system any less complex, I'd say you're most likely going to shoot yourself in the foot. > On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky > wrote: > HA requires an odd number of replicas - 3, 5, 7 - so that split-brain can > be avoided. Two nodes would not support HA. You need to be able to reach > a quorum, which is defined as n/2+1 where n is the number of replicas. > IOW, you cannot update the data if a quorum cannot be reached. The data > on any given node needs to be replicated on at least two other nodes. > > Embedded Cassandra is only for extremely sophisticated developers - not > those who are new to Cassandra, with a "superficial understanding". > > As a general proposition, you should not be running application code on > Cassandra nodes. > > That said, if any of the senior Cassandra developers wish to personally > support your efforts towards embedded clusters, they are certainly free > to do so. we'll see if any of them step forward. > > > -- Jack Krupansky > >> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas >> wrote: >> Hi all, >> >> TL;DR: I have a very superficial understanding of Cassandra and am >> currently evaluating it for a project. >> >> * Can Cassandra be embedded into another JVM application? >> * Can such embedded instances form a cluster? >> * Can the application use the the failure detection and cluster >> membership dissemination infrastructure of embedded Cassandra? >> >> >> >
Re: Sudden disk usage
Hi, what kind of compaction strategy do you use? What you are about to see is a compaction likely - think of 4 sstables of 50gb each, compacting those can take up 200g while rewriting the new sstable. After that the old ones are deleted and space will be freed again. If using SizeTieredCompaction you can end up with very huge sstables as I do (>250gb each). In the worst case you could possibly need twice the space - a reason why I set up my monitoring for disk to 45% usage. Just my 2 cents. Jan Von meinem iPhone gesendet > Am 13.02.2016 um 08:48 schrieb Branton Davis : > > One of our clusters had a strange thing happen tonight. It's a 3 node > cluster, running 2.1.10. The primary keyspace has RF 3, vnodes with 256 > tokens. > > This evening, over the course of about 6 hours, disk usage increased from > around 700GB to around 900GB on only one node. I was at a loss as to what > was happening and, on a whim, decided to run nodetool cleanup on the > instance. I had no reason to believe that it was necessary, as no nodes were > added or tokens moved (not intentionally, anyhow). But it immediately > cleared up that extra space. > > I'm pretty lost as to what would have happened here. Any ideas where to look? > > Thanks! >
Re: Cassandra is consuming a lot of disk space
Hi Rahul, it should work as you would expect - simply copy over the sstables from your extra disk to the original one. To minimize downtime of the node you can do something like this: - rsync the files while the node is still running (sstables are immutable) to copy most of the data - edit cassandra.yaml to remove the additional datadir - shutdown the node - rsync again (just for the case, a new sstable got written while the first one was running) - restart HTH Jan Am 14.01.2016 um 08:38 schrieb Rahul Ramesh: > One update. I cleared the snapshot using nodetool clearsnapshot command. > Disk space is recovered now. > > Because of this issue, I have mounted one more drive to the server and > there are some data files there. How can I migrate the data so that I > can decommission the drive? > Will it work if I just copy all the contents in the table directory to > one of the drives? > > Thanks for all the help. > > Regards, > Rahul > > On Thursday 14 January 2016, Rahul Ramesh <mailto:rr.ii...@gmail.com>> wrote: > > Hi Jan, > I checked it. There are no old Key Spaces or tables. > Thanks for your pointer, I started looking inside the directories. I > see lot of snapshots directory inside the table directory. These > directories are consuming space. > > However these snapshots are not shown when I issue listsnapshots > ./bin/nodetool listsnapshots > Snapshot Details: > There are no snapshots > > Can I safely delete those snapshots? why listsnapshots is not > showing the snapshots? Also in future, how can we find out if there > are snapshots? > > Thanks, > Rahul > > > > On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten > wrote: > > Hi Rahul, > > just an idea, did you have a look at the data directorys on disk > (/var/lib/cassandra/data)? It could be that there are some from > old keyspaces that have been deleted and snapshoted before. Try > something like "du -sh /var/lib/cassandra/data/*" to verify > which keyspace is consuming your space. > > Jan > > Von meinem iPhone gesendet > > Am 14.01.2016 um 07:25 schrieb Rahul Ramesh >: > >> Thanks for your suggestion. >> >> Compaction was happening on one of the large tables. The disk >> space did not decrease much after the compaction. So I ran an >> external compaction. The disk space decreased by around 10%. >> However it is still consuming close to 750Gb for load of 250Gb. >> >> I even restarted cassandra thinking there may be some open >> files. However it didnt help much. >> >> Is there any way to find out why so much of data is being >> consumed? >> >> I checked if there are any open files using lsof. There are >> not any open files. >> >> *Recovery:* >> Just a wild thought >> I am using replication factor of 2 and I have two nodes. If I >> delete complete data on one of the node, will I be able to >> recover all the data from the active node? >> I don't want to pursue this path as I want to find out the >> root cause of the issue! >> >> >> Any help will be greatly appreciated >> >> Thank you, >> >> Rahul >> >> >> >> >> >> >> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote: >> >> You can check if the snapshot exists in the snapshot folder. >> Repairs stream sstables over, than can temporary increase >> disk space. But I think Carlos Alonso might be correct. >> Running compactions might be the issue. >> >> Regards, >> >> Carlos Juzarte Rolo >> Cassandra Consultant >> >> Pythian - Love your data >> >> rolo@pythian | Twitter: @cjrolo | Linkedin: >> _linkedin.com/in/carlosjuzarterolo >> <http://linkedin.com/in/carlosjuzarterolo>_ >> Mobile: +351 91 891 81 00 >> | Tel: +1 613 565 8696 >> x1649 >> www.pythian.com <http://www.pythian.com/> >> >> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso >> > > wrote: >> >> I'd have a look also at possible running compactions. >> >>
Re: Cassandra is consuming a lot of disk space
Hi Rahul, just an idea, did you have a look at the data directorys on disk (/var/lib/cassandra/data)? It could be that there are some from old keyspaces that have been deleted and snapshoted before. Try something like "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming your space. Jan Von meinem iPhone gesendet > Am 14.01.2016 um 07:25 schrieb Rahul Ramesh : > > Thanks for your suggestion. > > Compaction was happening on one of the large tables. The disk space did not > decrease much after the compaction. So I ran an external compaction. The disk > space decreased by around 10%. However it is still consuming close to 750Gb > for load of 250Gb. > > I even restarted cassandra thinking there may be some open files. However it > didnt help much. > > Is there any way to find out why so much of data is being consumed? > > I checked if there are any open files using lsof. There are not any open > files. > > Recovery: > Just a wild thought > I am using replication factor of 2 and I have two nodes. If I delete complete > data on one of the node, will I be able to recover all the data from the > active node? > I don't want to pursue this path as I want to find out the root cause of the > issue! > > > Any help will be greatly appreciated > > Thank you, > > Rahul > > > > > > >> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo wrote: >> You can check if the snapshot exists in the snapshot folder. >> Repairs stream sstables over, than can temporary increase disk space. But I >> think Carlos Alonso might be correct. Running compactions might be the issue. >> >> Regards, >> >> Carlos Juzarte Rolo >> Cassandra Consultant >> >> Pythian - Love your data >> >> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo >> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649 >> www.pythian.com >> >>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso wrote: >>> I'd have a look also at possible running compactions. >>> >>> If you have big column families with STCS then large compactions may be >>> happening. >>> >>> Check it with nodetool compactionstats >>> >>> Carlos Alonso | Software Engineer | @calonso >>> On 13 January 2016 at 05:22, Kevin O'Connor wrote: Have you tried restarting? It's possible there's open file handles to sstables that have been compacted away. You can verify by doing lsof and grepping for DEL or deleted. If it's not that, you can run nodetool cleanup on each node to scan all of the sstables on disk and remove anything that it's not responsible for. Generally this would only work if you added nodes recently. > On Tuesday, January 12, 2016, Rahul Ramesh wrote: > We have a 2 node Cassandra cluster with a replication factor of 2. > > The load factor on the nodes is around 350Gb > > Datacenter: Cassandra > == > Address RackStatus State LoadOwns > Token > > -5072018636360415943 > 172.31.7.91 rack1 Up Normal 328.5 GB100.00% > -7068746880841807701 > 172.31.7.92 rack1 Up Normal 351.7 GB100.00% > -5072018636360415943 > > However,if I use df -h, > > /dev/xvdf 252G 223G 17G 94% /HDD1 > /dev/xvdg 493G 456G 12G 98% /HDD2 > /dev/xvdh 197G 167G 21G 90% /HDD3 > > > HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one > of the machine and in another machine it is close to 650Gb. > > I started repair 2 days ago, after running repair, the amount of disk > space consumption has actually increased. > I also checked if this is because of snapshots. nodetool listsnapshot > intermittently lists a snapshot but it goes away after sometime. > > Can somebody please help me understand, > 1. why so much disk space is consumed? > 2. Why did it increase after repair? > 3. Is there any way to recover from this state. > > > Thanks, > Rahul >> >> >> -- >> >
ClosedChannelExcption while nodetool repair
Hi, I have some problems recently on my cassandra cluster. I am running 12 nodes with 2.2.4 and while repairing with a plain "nodetool repair". In system.log I can find ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef] Streaming error occurred java.nio.channels.ClosedChannelException: null on one node and at the same time in the the node mentioned in the Log: INFO [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073 StreamResultFuture.java:168 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving 2 files(46708049 bytes), sending 2 files(1856721742 bytes) ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef] Streaming error occurred org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.4.jar:2.2.4] Full relevant NFO [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073 StreamResultFuture.java:168 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving 2 files(46708049 bytes), sending 2 files(1856721742 bytes) ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef] Streaming error occurred org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.4.jar:2.2.4] More complete log can be found here: http://pastebin.com/n6DjCCed http://pastebin.com/6rD5XNwU I already did a nodetool scrub. Any suggestions what is causing this? Thanks in advance, Jan
Strange Sizes after 2.1.3 upgrade
Hi, I found something strange this morning on our secondary cluster. I upgraded to 2.1.3 - hoping for incremental repairs to work - recently and this morning OpsCenter showed me disk usages to be very unequal. Most irritating is that some nodes show data sizes of > 3TB on one node, but they have only 3 TB drives. I made a screenshot. https://www.dropbox.com/s/0qhbpm1znwd07rj/strange_sizes.png?dl=0 Did this occur somewhere else? Maybe it is totally unrelated to 2.1.3 upgrade. Thanks for any pointers, Jan
Re: Node stuck in joining the ring
Hi Batranut, apart from the other suggestions - do you have ntp running on all your cluster nodes and are times in sync? Jan
Re: Node joining take a long time
Hi, a short hint for those upgrading: If you upgrade to 2.1.3 - there is a bug in the config builder when rpc_interface is used. If you use rpc_address in your cassandra.yaml you will be fine - I ran into it this morning and filed an issue for it. https://issues.apache.org/jira/browse/CASSANDRA-8839 Jan
Re: Many really small SSTables
Hi Eric and all, I almost expected this kind answer. I did a nodetool compactionstats already to see if those sstables are beeing compacted, but on all nodes there are 0 outstanding compactions (right now in the morning, not running any tests on this cluster). The reported read latency is about 1-3ms and on nodes which have many sstables (new highscore are ~18k sstables). The 99% percentile is about 30-40 micros and a cell count of about 80-90 (if I got the docs right these are the number of sstables accessed, that changed from 2.0 to 2.1 I think as I see this only on testing cluster). I looks to me that compactions were not triggered. I tried a nodetool compact on one node overnight - but that crashed the entire node. Roland Am 15.01.2015 um 19:14 schrieb Eric Stevens: Yes, many sstables can have a huge negative impact read performance, and will also create memory pressure on that node. There are a lot of things which can produce this effect, and it strongly also suggests you're falling behind on compaction in general (check nodetool compactionstats, you should have <5 outstanding/pending, preferably 0-1). To see whether and how much it is impacting your read performance, check nodetool cfstats and nodetool cfhistograms . On Thu, Jan 15, 2015 at 2:11 AM, Roland Etzenhammer mailto:r.etzenham...@t-online.de>> wrote: Hi, I'm testing around with cassandra fair a bit, using 2.1.2 which I know has some major issues,but it is a test environment. After some bulk loading, testing with incremental repairs and running out of heap once I found that now I have a quit large number of sstables which are really small: <1k 0 0,0% <10k 2780 76,8% <100k 3392 93,7% <1000k3461 95,6% <1k 3471 95,9% <10k 3517 97,1% <100k 3596 99,3% all 3621100,0% 76,8% of all sstables in this particular column familiy are smaller that 10kB, 93.7% are smaller then 100kB. Just for my understanding - does that impact performance? And is there any way to reduce the number of sstables? A full run of nodetool compact is running for a really long time (more than 1day). Thanks for any input, Roland -- i.A. Jan Kesten Systemadministration enercast GmbH Friedrich - Ebert - Straße 104 D–34119 Kassel Tel.: +49 561 / 4739664-0 Fax: (+49)561/4739664-9 mailto: j.kes...@enercast.de http://www.enercast.de AG Kassel HRB 15471 Thomas Landgraf Geschäftsführer t.landg...@enercast.de Tel.: (+49)561/4739664-0 FAX: -9 Mobil: (+49)172/6565087 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Nodetool clearsnapshot
Hi, I have read that snapshots are basicaly symlinks and they do not take that much space. Why if I run nodetool clearsnapshot it frees a lot of space? I am seeing GBs freed... both together makes sense. Creating a snaphot just creates links for all files unter the snapshot directory. This is very fast and takes no space. But those links are hard links, not symbolic ones. After a while your running cluster will compact some of its sstables and writing it to a new one as deleting the old ones. Now for example you had SSTable1..4 and a snapshot with the links to those four after compaction you will have one active SSTable5 which is newly written and consumes space. The snapshot-linked ones are still there, still consuming their space. Only when this snapshot is cleared you get your disk space back. HTH, Jan
Re: Replacing nodes disks
Hi, even if recovery like a dead node would work - backup and restore (like my way with an usb docking station) will be much faster and produce less IO and CPU impact on your cluster. Keep that in Mind :-) Cheers, Jan Am 22.12.2014 um 10:58 schrieb Or Sher: Great. replace_address works great. From some reason I thought it won't work with the same IP. On Sun, Dec 21, 2014 at 5:14 PM, Ryan Svihla <mailto:rsvi...@datastax.com>> wrote: Cassandra is designed to rebuild a node from other nodes, whether a node is dead by your hand because you killed it or fate is irrelevant, the process is the same, a "new node" can be the same hostname and ip or it can have totally different ones. On Sun, Dec 21, 2014 at 6:01 AM, Or Sher mailto:or.sh...@gmail.com>> wrote: If I'll use the replace_address parameter with the same IP address, would that do the job? On Sun, Dec 21, 2014 at 11:20 AM, Or Sher mailto:or.sh...@gmail.com>> wrote: What I want to do is kind of replacing a dead node - http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html But replacing it with a clean node with the same IP and hostname. On Sun, Dec 21, 2014 at 9:53 AM, Or Sher mailto:or.sh...@gmail.com>> wrote: Thanks guys. I have to replace all data disks, so I don't have another large enough local disk to move the data to. If I'll have no choice, I will backup the data before on some other node or something, but I'd like to avoid it. I would really love letting Cassandra do it thing and rebuild itself. Did anybody handled such cases that way (Letting Cassandra rebuild it's data?) Although there are no documented procedure for it, It should be possible right? On Fri, Dec 19, 2014 at 8:41 AM, Jan Kesten mailto:j.kes...@enercast.de>> wrote: Hi Or, I did some sort of this a while ago. If your machines do have a free disk slot - just put another disk there and use it as another data_file_directory. If not - as in my case: - grab an usb dock for disks - put the new one in there, plug in, format, mount to /mnt etc. - I did an online rsync from /var/lib/cassandra/data to /mnt - after that, bring cassandra down - do another rsync from /var/lib/cassandra/data to /mnt (should be faster, as sstables do not change, minimizes downtime) - if you need adjust /etc/fstab if needed - shutdown the node - swap disks - power on the node - everything should be fine ;-) Of course you will need a replication factor > 1 for this to work ;-) Just my 2 cents, Jan rsync the full contents there, Am 18.12.2014 um 16:17 schrieb Or Sher: Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the nodes data from other replicas. What do you think should be the procedure here? I'm guessing it should be something like this but I'm pretty sure it's not enough. 1. shutdown C* node and server. 2. replace disks + create the same vg lv etc. 3. start C* (Normally?) 4. nodetool repair/rebuild? *I think I might get some consistency issues for use cases relying on Quorum reads and writes for strong consistency. What do you say? Another question is (and I know it depends on many factors but I'd like to hear an experienced estimation): How much time would tak
Re: Replacing nodes disks
Hi Or, I did some sort of this a while ago. If your machines do have a free disk slot - just put another disk there and use it as another data_file_directory. If not - as in my case: - grab an usb dock for disks - put the new one in there, plug in, format, mount to /mnt etc. - I did an online rsync from /var/lib/cassandra/data to /mnt - after that, bring cassandra down - do another rsync from /var/lib/cassandra/data to /mnt (should be faster, as sstables do not change, minimizes downtime) - if you need adjust /etc/fstab if needed - shutdown the node - swap disks - power on the node - everything should be fine ;-) Of course you will need a replication factor > 1 for this to work ;-) Just my 2 cents, Jan rsync the full contents there, Am 18.12.2014 um 16:17 schrieb Or Sher: Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the nodes data from other replicas. What do you think should be the procedure here? I'm guessing it should be something like this but I'm pretty sure it's not enough. 1. shutdown C* node and server. 2. replace disks + create the same vg lv etc. 3. start C* (Normally?) 4. nodetool repair/rebuild? *I think I might get some consistency issues for use cases relying on Quorum reads and writes for strong consistency. What do you say? Another question is (and I know it depends on many factors but I'd like to hear an experienced estimation): How much time would take to rebuild a 250G data node? Thanks in advance, Or. -- Or Sher
sstablemetadata and sstablerepairedset not working with DSC on Debian
Hi, while curious on the new incremental repairs I updated our cluster to C* version 2.1.2 via the Debian apt-repository. Everything went quite well, but trying to start the tools sstablemetadata and sstablerepairedset lead to the following error: root@a01:/home/ifjke# sstablerepairedset Error: Could not find or load main class org.apache.cassandra.tools.SSTableRepairedAtSetter root@a01:/home/ifjke# Looking at the scripts starting these tools I found that the java classpath is build via for jar in `dirname $0`/../../lib/*.jar; do CLASSPATH=$CLASSPATH:$jar done Because of the scripts beeing located in /usr/bin/ this leads to search for libs in /lib. Obviously there are no java or cassandra libraries there - nodetool instead uses a different way: if [ "x$CASSANDRA_INCLUDE" = "x" ]; then for include in "`dirname "$0"`/cassandra.in.sh" \ "$HOME/.cassandra.in.sh" \ /usr/share/cassandra/cassandra.in.sh \ /usr/local/share/cassandra/cassandra.in.sh \ /opt/cassandra/cassandra.in.sh; do if [ -r "$include" ]; then . "$include" break fi done elif [ -r "$CASSANDRA_INCLUDE" ]; then . "$CASSANDRA_INCLUDE" fi I created a simple patch which works for both sstablemetadata and sstablerepairedset for me, but maybe that's worth sharing it: ---SNIP--- --- sstablerepairedset2014-11-11 15:50:02.0 + +++ sstablerepairedset_new2014-12-18 07:52:26.967368891 + @@ -16,22 +16,19 @@ # See the License for the specific language governing permissions and # limitations under the License. -if [ "x$CLASSPATH" = "x" ]; then - -# execute from the build dir. -if [ -d `dirname $0`/../../build/classes ]; then -for directory in `dirname $0`/../../build/classes/*; do -CLASSPATH=$CLASSPATH:$directory -done -else -if [ -f `dirname $0`/../lib/stress.jar ]; then -CLASSPATH=`dirname $0`/../lib/stress.jar +if [ "x$CASSANDRA_INCLUDE" = "x" ]; then +for include in "`dirname "$0"`/cassandra.in.sh" \ + "$HOME/.cassandra.in.sh" \ + /usr/share/cassandra/cassandra.in.sh \ + /usr/local/share/cassandra/cassandra.in.sh \ + /opt/cassandra/cassandra.in.sh; do +if [ -r "$include" ]; then +. "$include" +break fi -fi - -for jar in `dirname $0`/../../lib/*.jar; do -CLASSPATH=$CLASSPATH:$jar done +elif [ -r "$CASSANDRA_INCLUDE" ]; then +. "$CASSANDRA_INCLUDE" fi # Use JAVA_HOME if set, otherwise look for java in PATH ---SNIP--- Worked for me on both tools. Jan
Re: Cassandra schema migrator
Hi Jens, maybe you should have a look at mutagen for cassandra: https://github.com/toddfast/mutagen-cassandra It is a litte quiet around this for some months, but maybe still worth it. Cheers, Jan Am 25.11.2014 um 10:22 schrieb Jens Rantil: Hi, Anyone who is using, or could recommend, a tool for versioning schemas/migrating in Cassandra? My list of requirements is: * Support for adding tables. * Support for versioning of table properties. All our tables are to be defaulted to LeveledCompactionStrategy. * Support for adding non-existing columns. * Optional: Support for removing columns. * Optional: Support for removing tables. We are preferably a Java shop, but could potentially integrate something non-Java. I understand I could write a tool that would make these decisions using system.schema_columnfamilies and system.schema_columns, but as always reusing a proven tool would be preferable. So far I only know of Spring Data Cassandra that handles creating tables and adding columns. However, it does not handle table properties in any way. Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: repair -pr does not return
Hi Duncan, is it actually doing something or does it look like it got stuck? 2.0.7 has a fix for a getting stuck problem. it starts with sending merkle trees and streaming for some time (some hours in fact) and then seems just to hang. So I'll try to update and see it that's solves the issue. Thanks for that hint! Cheers, Jan
repair -pr does not return
Hello together, I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I know, routine repairs are still mandatory for handling tombstones - even I noticed that the cluster now does a "snapshot-repair" by default. Now my cluster is running a while and has a load of about 200g per node - running a "nodetool repair -pr" on one of the nodes seems to run forever, right now it's running for 2 complete days and does not return. Any suggestions? Thanks in advance, Jan
Re: Cassandra Disk storage capacity
Am 07.04.2014 13:24, schrieb Hari Rajendhran: 1) I am confused why cassandra uses the entire disk space ( / Directory) even when we specify /var/lib/cassandra/data as the directory in Cassandra.yaml file 2) Is it only during compaction ,cassandra will use the entire Disk space ? 3) What is the best way to monitor the cassandra Disk usage ?? is there a opensource monitoring tool for this ?? Hi, if your / and /var/lib/cassandra/data are on different disks (or partitions) only /var/lib/cassandra/data will get filled entirely. Often this is not the case per default and you will have to create this mountpoints by yourself. Also keep in mind to keep commitlogs on a seperate disk from data to improve performance. The extra space is only needed during compaction - but cassandra will fire up compactions by itself, so you must keep this free space maintained all the time. This is valid for SizeTieredCompation, Leveled- or HybridCompations are "cheaper" on disk space. For the last point - there are many tools to monitor your servers inside your cluster. Nagios, Hyperic HQ and OpenNMS are some of them - you can define alerts which keep you up to date. Cheers, jan
Re: Cassandra Disk storage capacity
Hi Hari, C* will use your entire space - that is something one should monitor. Depending on your choose on compaction strategy your data_dir should not be filled up entirely - in the worst case compaction will need space as large as the sstables on disk, therefore 50% should be free space. The parameters used for on disk storage are commitlog_directory and data_file_directories and saved_caches_directory. The paramter data_file_directories is in plural, you can easily put more than one directory here (and you should do this instead of using RAID). Cheers, Jan Am 07.04.2014 12:56, schrieb Hari Rajendhran: Hi Team, We have a 3 node Apache cassandra 2.0.4 setup installed in our lab setup.We have set data directory to /var/lib/cassandra/data.What would be the maximum disk storage that will be used for cassandra data storage. Note : /var partition has a storage capacity of 40GB. My question is whether cassandra will the entire / directory for data storage ? If no, how to specify multiple directories for data storage ?? Best Regards Hari Krishnan Rajendhran Hadoop Admin DESS-ABIM ,Chennai BIGDATA Galaxy Tata Consultancy Services Cell:- 9677985515 Mailto: hari.rajendh...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Corrupted sstable and sstableloader
On 18.07.2013 19:19, Robert Coli wrote: Why not just determine which SSTable is corrupt, remove it from the restore set, then run a repair when you're done to be totally sure all data is on all nodes? This is what I did finally - was some kind of work, since sstableloader just stopped with exception but no hint which file was affected. So I replayed the sstables one by one and finally found the corrupt one. Thanks to all, Jan -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Corrupted sstable and sstableloader
Hi, I think it might be corrupted due to a poweroutage. Apart from this issue reading the data with consistency level quorum (I have three replicas) did not issue an error - only the import to a different cluster. So, if I import all nodes except the one with the corrupted sstable - shoudn't I import two of the three replicas, so that the data is complete? Von meinem iPhone gesendet Am 18.07.2013 um 19:06 schrieb sankalp kohli : > sstable might be corrupted due to bad disk. In that case, replication does > not matter. > > > On Thu, Jul 18, 2013 at 8:52 AM, Jan Kesten wrote: >> Hello together, >> >> today I experienced a problem while loading a snapshot from our cassandra >> cluster to test cluster. The cluster has six nodes and I took a snapshot >> from all nodes concurrently and tried to import them in the other cluster. >> >> From 5 out of 6 nodes importing went well with no errors. But one snapshot >> of one node cannot be imported - I tried serveral times. I got the following >> while running sstableloader: >> >> ERROR 09:13:06,084 Error in ThreadPoolExecutor >> java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen >> (broken pipe) >> at com.google.common.base.Throwables.propagate(Throwables.java:160) >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:724) >> Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe) >> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) >> at >> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420) >> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552) >> at >> org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) >> at >> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >> ... 3 more >> Exception in thread "Streaming to /172.17.2.216:1" >> java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen >> (broken pipe) >> at com.google.common.base.Throwables.propagate(Throwables.java:160) >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:724) >> Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe) >> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) >> at >> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420) >> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552) >> at >> org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) >> at >> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >> ... 3 more >> >> I suspect that the sstable on the node is corrupted in some way - and a >> scrub and repair should fix that I suppose. >> >> Since the original cluster has a replication factor of 3 - shoudn't the >> import from 5 of 6 snapshots contain all data? Or is the sstableloader tool >> too clever and avoids importing double data? >> >> Thanks for hints, >> Jan >> >> -- >> Jan Kesten, mailto:j.kes...@enercast.de >> Tel.: +49 561/4739664-0 FAX: -9 >> enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 >> http://www.enercast.de Online-Prognosen für erneuerbare Energien >> Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz >> >> Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich >> geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger >> sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, >> benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie >> diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie >> diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. >> Vielen Dank. >> >> This e-mail and any attachment may contain confidential and/or privileged >> information. If you are not the named addressee or if this transmission has >> been addressed to you in error, please notify us immediately by reply e-mail >> and then delete this e-mail and any attachment from your system. Please >> understand that you must not copy this e-mail or any attachment or disclose >> the contents to any other person. Thank you for your cooperation. >
Corrupted sstable and sstableloader
Hello together, today I experienced a problem while loading a snapshot from our cassandra cluster to test cluster. The cluster has six nodes and I took a snapshot from all nodes concurrently and tried to import them in the other cluster. From 5 out of 6 nodes importing went well with no errors. But one snapshot of one node cannot be imported - I tried serveral times. I got the following while running sstableloader: ERROR 09:13:06,084 Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more Exception in thread "Streaming to /172.17.2.216:1" java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more I suspect that the sstable on the node is corrupted in some way - and a scrub and repair should fix that I suppose. Since the original cluster has a replication factor of 3 - shoudn't the import from 5 of 6 snapshots contain all data? Or is the sstableloader tool too clever and avoids importing double data? Thanks for hints, Jan -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: CorruptBlockException - recover?
Hi, i tried to scrub the keyspace - but with no success either, the process threw an exception when hitting the corrupt block and stopped then. I will rebootstrap the node :-) Thanks anyways, Jan On 03.07.2013 19:10, Glenn Thompson wrote: For what its worth. I did this when I had this problem. It didn't work out for me. Perhaps I did something wrong. On Wed, Jul 3, 2013 at 11:06 AM, Robert Coli <mailto:rc...@eventbrite.com>> wrote: On Wed, Jul 3, 2013 at 7:04 AM, ifjke mailto:j.kes...@enercast.de>> wrote: I found that one of my cassandra nodes died recently (machine hangs). I restarted the node an run a nodetool repair, while running it has thrown a org.apache.cassandra.io <http://org.apache.cassandra.io>.compress.CorruptBlockException. Is there any way to recover from this? Or would it be best to delete the nodes contents and bootstrap it again? If you "scrub" this SSTable (either with the online or offline version of "scrub") it will remove the corrupt data and re-write the rest of the SSTable which isn't corrupt into a new SSTable. That is probably safer for your data than deleting the entire set of data on this replica. When that's done, restart the repair. =Rob -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Problem setting up encrypted communication
Hello together, after my inital tests all is up and running, replacing a dead node was no problem at all. Now I tried to setup encryption between nodes. I set up keystores and a truststore as described in the docs. Every node has it's own keystore with one private key and a truststore with all imported public keys/certs. for my first node: db02, Mar 13, 2013, PrivateKeyEntry, Certificate fingerprint (SHA1): D3:B1:37:8A:05:43:F1:7A:F9:70:7A:4C:91:6F:09:96:BF:75:21:81 for my second node: db01, Mar 13, 2013, PrivateKeyEntry, Certificate fingerprint (SHA1): BA:E9:F4:06:15:AE:CC:79:18:8B:69:C0:70:EF:19:82:0E:81:76:E8 shared truststore: db02, Mar 13, 2013, trustedCertEntry, Certificate fingerprint (SHA1): D3:B1:37:8A:05:43:F1:7A:F9:70:7A:4C:91:6F:09:96:BF:75:21:81 db01, Mar 13, 2013, trustedCertEntry, Certificate fingerprint (SHA1): BA:E9:F4:06:15:AE:CC:79:18:8B:69:C0:70:EF:19:82:0E:81:76:E8 relevant cassandra.yaml (db01 and db02 differ on both nodes): server_encryption_options: internode_encryption: all keystore: /home/cassandra/certs/db01.keystore keystore_password: cassandra truststore: /home/cassandra/certs/.truststore truststore_password: cassandra Now the question that puzzels me. If I disable encryption and start both nodes the join each other an I have a working cluster. If I enable encryption they do not join any longer and I have to seperate nodes. Any hints? Thanks, Jan
Re: Replacing dead node when num_tokens is used
Hello Aaron, thanks for your reply. Found it just an hour ago on my own, yesterday I accidentally looked at the 1.0 docs. Right now my replacement node is streaming from the others - than more testing can follow. Thanks again, Jan
Replacing dead node when num_tokens is used
Hello, while trying out cassandra I read about the steps necessary to replace a dead node. In my test cluster I used a setup using num_tokens instead of initial_tokens. How do I replace a dead node in this scenario? Thanks, Jan