Re: Java Client Driver for Cassandra 2.0.14
I started using the datastax driver (coming from astynax driver) recently. It is awesome! Use it :D https://github.com/datastax/java-driver Cheers, artur On 15/05/15 10:32, Rohit Naik wrote: DISCLAIMER == This e-mail may contain privileged an
Nodetool removenode stuck
Hi, we have had an issue with one of our nodes today: 1. Due to a wrong setup the starting node failed to properly bootstrap. It was shown as UN in the cluster however did not contain any data and we shut it down to fix our configuration issue. 2. We figured we need to remove the node from the cluster before being able to restart it cleanly and have it bootstrap automatically. We used nodetool removenode UUID which caused mutliple nodes in our Datacenter to be marked as DOWN for some reason (taken from the log) and a bunch of operations against our cluster to fail. The nodes have come up again and other than a slight heart attack we are fine. However, the removenode operation is now stuck and won't continue. Can anyone recommend on how to proceed safely from here? The node is marked as DL in our cluster. I found https://issues.apache.org/jira/browse/CASSANDRA-6542 however there is no hint on how to handle this properly. Is it save to use the force option here? We don't want to risk the cluster going down for whatever reason again. Thank you! Artur
Re: How to prevent the removed DC comes back automactically?
Hey, not sure if that's what you're looking for but you can use auto_bootstrap=false in your yaml file to prevent nodes from bootstrapping themselves on startup. This option has been removed and the default is true. You can add it to your configuration though. There's a bit of documentation here: http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html -- artur On 14/08/14 06:48, Lu, Boying wrote: Hi, All, We are using Cassandra 2.0.7 in a multi DCs environments. If a connected DC is powered off, we use the ‘nodetool removenode’ command to remove it from the connected DCs. But we found that once the disconnected DC is powered on, it will connect to other DCs automatically. How can we prevent a disconnected DC from coming back automatically? Thanks a lot Boying
Re: Configuring all nodes as seeds
Hi, pretty sure we started out like that and had not seen any problems doing that. On a side node, that config may become inconsistent anyway after adding new nodes, because I think you'll need a restart of all your nodes if you add new seeds to the yaml file. (Though that's just assumption) On 18/06/14 09:09, Peer, Oded wrote: My intended Cassandra cluster will have 15 nodes per DC, with 2 DCs. I am considering using all the nodes as seed nodes. It looks like having all the nodes as seeds should actually reduce the Gossip overhead (See Gossiper implementation in http://wiki.apache.org/cassandra/ArchitectureGossip) Is there any reason not do this?
Re: repair -pr does not return
Hi, to be honest 2 days for 200GB nodes doesn't sound too unreasonable to me (depending on your hardware of course). We were running a ~20 GB cluster with regualr hard drives (no SSD) and our first repair ran a day as well if I recall correctly. We since improved our hardware and got it down to a couple of hours (~5h for all nodes triggering a -pr repair). As far as I know you can use nodetool compactionstats and nodetool netstats to check for activity on your repairs. There may be a chance that it is hanging but also that it just really takes a quite long time. Cheers, -- artur On 02/05/14 09:12, Jan Kesten wrote: Hi Duncan, is it actually doing something or does it look like it got stuck? 2.0.7 has a fix for a getting stuck problem. it starts with sending merkle trees and streaming for some time (some hours in fact) and then seems just to hang. So I'll try to update and see it that's solves the issue. Thanks for that hint! Cheers, Jan
Backup procedure
Hi, we are running a 7 node cluster with an RF of 5. Each node holds about 70% of the data and we are now wondering about the backup process. 1. Is there a best practice procedure or a tool that we can use to have one backup that holds 100 % of the data or is it necessary for us to take multiple backups. 2. If we have to use multiple backups, is there a way to combine them? We would like to be able to start up a 1 node cluster that holds 100% of data if necessary. Can we just chug all sstables into the data directory and cassandra will figure out the rest? 3. How do we handle the commitlog files from all of our nodes? Given we'd like to restore to a certain point in time and we have all the commitlogs, can we have commitlogs from multiple locations in the commitlog folder and cassandra will pick and execute the right thing? 4. If all of the above would work, could we in case of emergency setup a massive 1-node cluster that holds 100 % of the data and repair the rest of our cluster based of this? E.g. have the 1 node run with the correct data, and then hook it into our existing cluster and call repair on it to restore data on the rest of our nodes? Thanks for your help! Cheers, Artur
Re: How to extract information from commit log?
Hi, we did something similar. We did utilize some cassandra code though and wrote a custom commitlog reader that outputs our data into a readable form. You can look here: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/1.1.9/org/apache/cassandra/db/commitlog/CommitLogReplayer.java This code is used to replay commitlogs when starting up cassandra. It has the ability to deserialize and transform the data into what you'll need. -- artur On 18/03/14 19:32, Han,Meng wrote: Hi Jonathan, Thank you for the timely reply. I am doing this experiment on a continuous basis. To be more specific, I will issue a large amount of read and write operations to a particular key in a short time interval. I'd like to know the order that write operations happens at each replica. TImestamps definitely help to determine order, but the WRITETIME and SStable2Json both looks me only return the timestamps when that key was updated the moment the WRITETIME/SStable2Json is issued. It looks like a one time thing to me. Or put in another way, if I want to get the write time for all write operations in that short invertal to determine a total order for write on that replia I have to constantly issue WRITETIME to this replica? Correct me if I am wrong here. Light me up pleease! On Tue, 18 Mar 2014 15:05:07 -0400, Jonathan Lacefield wrote: Hello, Is this a one time investigative item or are you looking to set something up to do this continuously? Don't recommend trying to read the commit log. You can always use the WRITETIME function in CQL or look within SSTables via the SStable2Json utility to see write times for particular versions of partitions. Jonathan Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/what-we-offer/products-services/training/virtual-training On Tue, Mar 18, 2014 at 2:25 PM, Han,Meng meng...@ufl.edu mailto:meng...@ufl.edu wrote: Hi Cassandra hackers! I have a question regarding extracting useful information from commit log. Since its a binary log, how should I extract information such as timestamp, values from it? Does anyone know any binary log reader that I can use directly to read commit log? If there is no such reader, could someone give me some advice hwo I can wrote such a reader? Particularly, I want to know the order that write operations happens at each replica(cassandra server node) along with their timestamps, Does anyone know other methods how I can get this information without instrumenting Cassandra code? Any help is appreciated! Cheers, Meng
Re: Question about node tool repair
About repairs, we encountered a similar problem with our setup where repairs would take ages to complete. Based on your setup you can try loading data into page cache before running repairs. Depending on how much data you can hold in cache, this will speed up your repairs massively. -- artur On 21/01/14 20:33, Logendran, Dharsan (Dharsan) wrote: Thanks Rob, Dharsan *From:*Robert Coli [mailto:rc...@eventbrite.com] *Sent:* January-21-14 2:26 PM *To:* user@cassandra.apache.org *Subject:* Re: Question about node tool repair On Mon, Jan 20, 2014 at 2:47 PM, Logendran, Dharsan (Dharsan) dharsan.logend...@alcatel-lucent.com mailto:dharsan.logend...@alcatel-lucent.com wrote: We have a two node cluster with the replication factor of 2. The db has more than 2500 column families(tables). The nodetool -pr repair on an empty database(one or table has a litter data) takes about 30 hours to complete. We are using Cassandra Version 2.0.4. Is there any way for us to speed up this?. Cassandra 2.0.2 made aspects of repair serial and therefore logically much slower as a function of replication factor. Yours is not the first report I have heard of = 2.0.2 era repair being unreasonably slow. https://issues.apache.org/jira/browse/CASSANDRA-5950 You can use -par (not at all confusingly named with -pr!) to get the old parallel behavior. Cassandra 2.1 has this ticket to improve repair with vnodes. https://issues.apache.org/jira/browse/CASSANDRA-5220 But really you should strongly consider how much you need to run repair, and at very least probably increase gc_grace_seconds from the unreasonably low default of 10 days to 32 days, and then run your repair on the first of each month. https://issues.apache.org/jira/browse/CASSANDRA-5850 IMO it is just a complete and total error if repair of an actually empty database is anything but a NO-OP. I would file a JIRA ticket, were I you. =Rob
Re: Try to configure commitlog_archiving.properties
It's been a while since I tried that but here are some things I can think of: * the .log.out seems wrong. Unless your cassandra commitlogs don't end in .log.out. I tried this locally with your script and my commitlogs get extracted to .log files for me. * I never tried the restore procedure on a cluster with multiple nodes. I imagine if you have a quorum defined the replayed commitlog may be ignored because the commitlog write operation is older then the deletion in which case the latter will be returned (nothing in your case) On 13/12/13 13:27, Bonnet Jonathan. wrote: Hello, As i told you i began to explore restore operations, see my config for archive commit logs: archive_command=/bin/bash /produits/cassandra/scripts/cassandra-archive.sh %path %name restore_command=/bin/bash /produits/cassandra/scripts/cassandra-restore.sh %from %to restore_directories=/produits/cassandra/cassandra_data/archived_commit restore_point_in_time=2013:12:11 17:00:00 My 2 scripts cassandra-archive.sh: bzip2 --best -k $1 mv $1.bz2 /produits/cassandra/cassandra_data/archived_commit/$2.bz2 cassandra-restore.sh: cp -f $1 $2 bzip2 -d $2 For an example, at 2013:12:11 17:30:00 i had truncate a table which belong to a keyspace with no replication on one node, after that i made a nodetool flush. So when i restore to 2013:12:11 17:00:00 i expect to have my table bein fill up again. The node restart with this config correctly, i see my archive commit log come back to my commitlogdirectory, seems bizarre to me that these ones finish by *.out like CommitLog-3-1386927339271.log.out and not just .log. Everything is normal ? When i query my table now, this one is still empty. Finaly my restore doesn't work and i wonder why ? Do i have to make a restore on all nodes ? my keyspace have no replication but perhaps restore need same operation on all node. I miss something, i don't know. Thanks for your help.
Re: Try to configure commitlog_archiving.properties
hi Bonnet, that doesn't seem to be a problem with your archiving, rather with the restoring. What is your restore command? -- artur On 11/12/13 13:47, Bonnet Jonathan. wrote: Bonnet Jonathan jonathan.bonnet at externe.bnpparibas.com writes: Thanks a lot, It Works, i see commit log bein archived. I'll try tomorrow the restore command. Thanks again. Bonnet Jonathan. Hello, I have restart a node today, and i have an error which seems to be in relation with commitlog archiving: ERROR 14:39:00,435 Exception encountered during startup java.lang.RuntimeException: java.io.IOException: Cannot run program : error=2, No such file or directory at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive (CommitLogArchiver.java:172) at org.apache.cassandra.db.commitlog.CommitLog.recover (CommitLog.java:104) at org.apache.cassandra.service.CassandraDaemon.setup (CassandraDaemon.java:305) at org.apache.cassandra.service.CassandraDaemon.activate (CassandraDaemon.java:461) at org.apache.cassandra.service.CassandraDaemon.main (CassandraDaemon.java:504) Caused by: java.io.IOException: Cannot run program : error=2, No such file or directory at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.cassandra.utils.FBUtilities.exec(FBUtilities.java:588) at org.apache.cassandra.db.commitlog.CommitLogArchiver.exec (CommitLogArchiver.java:182) at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive (CommitLogArchiver.java:168) ... 4 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(Unknown Source) at java.lang.ProcessImpl.start(Unknown Source) ... 8 more java.lang.RuntimeException: java.io.IOException: Cannot run program : error=2, No such file or directory at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive (CommitLogArchiver.java:172) at org.apache.cassandra.db.commitlog.CommitLog.recover (CommitLog.java:104) at org.apache.cassandra.service.CassandraDaemon.setup (CassandraDaemon.java:305) at org.apache.cassandra.service.CassandraDaemon.activate (CassandraDaemon.java:461) at org.apache.cassandra.service.CassandraDaemon.main (CassandraDaemon.java:504) Caused by: java.io.IOException: Cannot run program : error=2, No such file or directory at java.lang.ProcessBuilder.start(Unknown Source) at org.apache.cassandra.utils.FBUtilities.exec (FBUtilities.java:588) at org.apache.cassandra.db.commitlog.CommitLogArchiver.exec (CommitLogArchiver.java:182) at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive (CommitLogArchiver.java:168) ... 4 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(Unknown Source) at java.lang.ProcessImpl.start(Unknown Source) ... 8 more No help again on the net, nothing change since the last changes in commitlog_archiving.properties. The first time yesterday that i restart there was no problem,and my commitlog bein archived well. Someone can help me, please ? Regards, Bonnet Jonathan.
Re: Try to configure commitlog_archiving.properties
So, looking at the code: public void maybeRestoreArchive() { if (Strings.isNullOrEmpty(restoreDirectories)) return; for (String dir : restoreDirectories.split(,)) { File[] files = new File(dir).listFiles(); if (files == null) { throw new RuntimeException(Unable to list director + dir); } for (File fromFile : files) { File toFile = new File(DatabaseDescriptor.getCommitLogLocation(), new CommitLogDescriptor(CommitLogSegment.getNextId()).fileName()); String command = restoreCommand.replace(%from, fromFile.getPath()); command = command.replace(%to, toFile.getPath()); try { exec(command); } catch (IOException e) { throw new RuntimeException(e); } } } } I would like someone to confirm that, but it might potentially be a bug. It does the right thing for an empty restore directory. However it ignores the fact that the restore command could be empty. So for you, jonathan, I reckon you have the restore directory set? You don't need that to be set in order to archive (only if you want to restore it). So set your restore_directory property to empty and you should get rid of those errors. The directory needs to be set when you enable the restore command. On a second look, I am almost certain this is a bug, as the maybeArchive command does correctly check for the command to not be empty or null. The maybeRestore command needs to do the same thing for the restoreCommand. If someone confirms, I am happy to raise a bug. cheers, artur On 11/12/13 14:09, Bonnet Jonathan. wrote: Artur Kronenberg artur.kronenberg at openmarket.com writes: hi Bonnet, that doesn't seem to be a problem with your archiving, rather with the restoring. What is your restore command? -- artur On 11/12/13 13:47, Bonnet Jonathan. wrote: Thanks to answear so fast, I put nothing for restore ? should I ? cause i don't want to restore for the moment. Regards,
Re: Try to configure commitlog_archiving.properties
Hi, There is some docs on the internet for this operations. It is basically as presented in the archive-commitlog file. (commitlog_archiving.properties). The way the operations work: The operation is called automatically with parameters that give you control over what you want to do with it. It looks to me like your copy command for the archiving wants to copy every file. This is unnecessary. Cassandra will give you the arguments. E.g. Documentation for the archive command reads: # Command to execute to archive a commitlog segment # Parameters: %path = Fully qualified path of the segment to archive # %name = Name of the commit log. # Example: archive_command=/bin/ln %path /backup/%name # # Limitation: *_command= expects one command with arguments. STDOUT # and STDIN or multiple commands cannot be executed. You might want # to script multiple commands and add a pointer here. Argument 1 will give you the path to the files you'd want to copy, while argument 2 will give uou the name if it. You can then create a command: archive_command=/bin/bash /home/cassandra/scripts/cassandra-archive.sh %path %name The above would be an example for me. As the commands by default only execute 1 command, I have them point to a custom script that does what I desire. My script then looks something like this: #! /bin/bash # use bzip to compress the file bzip2 --best -k $1 # move to commit log archive mv $1.bz2 $HOME/commitlog_restore/$2.bz2 I compress my commitlog and then move it somewhere else. Cassandra will call this operation first, and then delete the commitlog. You can apply similar behaviours to all of those commands. I would recommend to set up a cleanup scripts, otherwise your saved commit-log files will floud your harddrive. I would also test this locally first to make sure it works. For testing locally, bare in mind that you may wanna set your max file sizes to very low values as the command is only called when the commitlog is deleted. So having commitlogs that are 250 MB or bigger will need a lot of writing to make the event trigger. I hope I got that right and it helps. Cheers. FYI there was a bug with this, so you may wanna be on the right version: https://issues.apache.org/jira/browse/CASSANDRA-5909 On 06/12/13 15:33, Vicky Kak wrote: Why, can you give me a good example and the good way to configure archive commit logs ? Take a look at the cassandra code ;) On Fri, Dec 6, 2013 at 3:34 PM, Bonnet Jonathan jonathan.bon...@externe.bnpparibas.com mailto:jonathan.bon...@externe.bnpparibas.com wrote: Hello, I try to configure commitlog_archiving.properties to take advantage of backup and restore at a point of time, but there is no ressources on internet for that. So i need some help. If I understand I have 4 parameters: archive_command= restore_command= restore_directories= restore_point_in_time= Forget for the moment the restore_point_in_time, should i put one command only for the archive_command; my wish is to copy all the commitlogs on another directory: restore_directories=/produits/cassandra/cassandra_data/archived_logs archive_command=/bin/cp - f /produits/cassandra/cassandra_commit/*.log /produits/cassandra/cassandra_ data/archived_logs restore_command=/bin/cp - f /produits/cassandra/cassandra_data/archived_logs/*.log /produits/cassandr a/cassandra_commit But it doesn't work when i restart the node which interest me, it doesn't copy anything. Why, can you give me a good example and the good way to configure archive commit logs ? there's nothing on the net except Datastax Website. Thanks for your answear. Regards, Bonnet Jonathan.
Re: reads and compression
Hi John, I am trying again :) The way I understand it is that compression gives you the advantage of having to use way less IO and rather use CPU. The bottleneck of reads is usually the IO time you need to read the data from disk. As a figure, we had about 25 reads/s reading from disk, while we get up to 3000 reads/s when we have all of it in cache. So having good compression reduces the amount you have to read from disk. Rather you may spend a little bit more time decompressing data, but this data will be in cache anyways so it won't matter. Cheers On 29/11/13 01:09, John Sanda wrote: This article[1] cites gains in read performance can be achieved when compression is enabled. The more I thought about it, even after reading the DataStax docs about reads[2], I realized I do not understand how compression improves read performance. Can someone provide some details on this? Is the compression offsets map still used if compression is disabled for a table? If so what is its rate of growth like as compared to the growth of the map when compression is enabled? [1] whats-new-in-cassandra-1-0-compression http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression [2] about reads http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=index#cassandra/dml/dml_about_reads_c.html Thanks - John
Re: Nodetool cleanup
Hi Julien, I hope I get this right :) a repair will trigger a mayor compaction on your node which will take up a lot of CPU and IO performance. It needs to do this to build up the data structure that is used for the repair. After the compaction this is streamed to the different nodes in order to repair them. If you trigger this on every node simultaneously you basically take the performance away from your cluster. I would expect cassandra still to function, just way slower then before. Triggering it node after node will leave your cluster with more resources to handle incoming requests. Cheers, Artur On 25/11/13 15:12, Julien Campan wrote: Hi, I'm working with Cassandra 1.2.2 and I have a question about nodetool cleanup. In the documentation , it's writted Wait for cleanup to complete on one node before doing the next I would like to know, why we can't perform a lot of cleanup in a same time ? Thanks
Re: Sorting keys for batch reads to minimize seeks
Hi, we did some testing and found that doing range queries is much quicker then querying data regularly. I am guessing that a range query request is going to seek much more efficiently on disk. This is where the idea of sorting our tokens comes in. We have a batch request of say 1000 items and instead of doing a multiget from cassandra which involves a lot of random I/O seeks, we would like to have a way to seek for the range. It doesn't actually matter if the range is slightly biggern then the amount of items we would like to retrieve as the time we loose filtering unneeded items in code is quicker then doing a multiget for 1000 items in the first place. Is there a way for basing token ranges somewhat on a certain value in our schema? Say every row has a value A and B. While A is just a random identifier and we can't really rely on what this will be, all our queries operate on a way that B is going to be the same value for all items in the query. If we had the token range being random however with regards that the random values are generated based on the B value and therefore all items with B are close together in range and therefore optimized for range queries rather then gets, that could possibly speed up read performance significantly. Thanks! Artur On 21/10/13 16:58, Edward Capriolo wrote: I am not sure what you are working on will have an effect. You can not actually control the way the operating system seeks data on disk. The io scheduling is done outside cassandra. You can try to write the code in an optimistic way taking phyical hardware into account, but then you have to consider there are n concurrent requests on the io system. On Friday, October 18, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com mailto:viktor.jevdoki...@adform.com wrote: Read latency depends on many factors, don't forget physics. If it meets your requirements, it is good. -Original Message- From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com mailto:artur.kronenb...@openmarket.com] Sent: Friday, October 18, 2013 1:03 PM To: user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Re: Sorting keys for batch reads to minimize seeks Hi, Thanks for your reply. Our latency currently is 23.618ms. However I simply read that off one node just now while it wasn't under a load test. I am going to be able to get a better number after the next test run. What is a good value for read latency? On 18/10/13 08:31, Viktor Jevdokimov wrote: The only thing you may win - avoid unnecessary network hops if: - request sorted keys (by token) from appropriate replica with ConsistencyLevel.ONE and dynamic_snitch: false. - nodes has the same load - replica not doing GC, and GC pauses are much higher than internode communication. For multiple keys request C* will do multiple single key reads, except for range scan requests, where only starting key and batch size is used in request. Consider multiple key request as a slow request by design, try to model your data for low latency single key requests. So, what latencies do you want to achieve? Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com mailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com mailto:artur.kronenb...@openmarket.com] Sent: Thursday, October 17, 2013 7:40 PM To: user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Sorting keys for batch reads to minimize seeks Hi, I am looking to somehow increase read performance on cassandra. We are still playing with configurations but I was thinking if there would be solutions in software that might help us speed up our read performance. E.g. one idea, not sure how sane that is, was to sort read-batches by row-keys before submitting them to cassandra. The idea is that row-keys should be closer together on the physical disk and therefor this may minimize the amount of random seeks we have to do when querying say 1000 entries from cassandra. Does that make any sense? Is there anything else that we can do in software to improve performance? Like specific batch sizes for reads? We are using the astyanax library to access cassandra. Thanks!
Re: Sorting keys for batch reads to minimize seeks
Hi, Thanks for your reply. Our latency currently is 23.618ms. However I simply read that off one node just now while it wasn't under a load test. I am going to be able to get a better number after the next test run. What is a good value for read latency? On 18/10/13 08:31, Viktor Jevdokimov wrote: The only thing you may win - avoid unnecessary network hops if: - request sorted keys (by token) from appropriate replica with ConsistencyLevel.ONE and dynamic_snitch: false. - nodes has the same load - replica not doing GC, and GC pauses are much higher than internode communication. For multiple keys request C* will do multiple single key reads, except for range scan requests, where only starting key and batch size is used in request. Consider multiple key request as a slow request by design, try to model your data for low latency single key requests. So, what latencies do you want to achieve? Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.-Original Message- From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com] Sent: Thursday, October 17, 2013 7:40 PM To: user@cassandra.apache.org Subject: Sorting keys for batch reads to minimize seeks Hi, I am looking to somehow increase read performance on cassandra. We are still playing with configurations but I was thinking if there would be solutions in software that might help us speed up our read performance. E.g. one idea, not sure how sane that is, was to sort read-batches by row-keys before submitting them to cassandra. The idea is that row-keys should be closer together on the physical disk and therefor this may minimize the amount of random seeks we have to do when querying say 1000 entries from cassandra. Does that make any sense? Is there anything else that we can do in software to improve performance? Like specific batch sizes for reads? We are using the astyanax library to access cassandra. Thanks!
Sorting keys for batch reads to minimize seeks
Hi, I am looking to somehow increase read performance on cassandra. We are still playing with configurations but I was thinking if there would be solutions in software that might help us speed up our read performance. E.g. one idea, not sure how sane that is, was to sort read-batches by row-keys before submitting them to cassandra. The idea is that row-keys should be closer together on the physical disk and therefor this may minimize the amount of random seeks we have to do when querying say 1000 entries from cassandra. Does that make any sense? Is there anything else that we can do in software to improve performance? Like specific batch sizes for reads? We are using the astyanax library to access cassandra. Thanks!
Heap requirement for Off-heap space
Hi, I was playing around with cassandra off-heap options. I configured 3 GB off-heap for my row cache and 2 GB Heap space for cassandra. After running a bunch of load tests against it I saw the cache warm up. Doing a jmap histogram I noticed a lot of offHeapkey objects. At that point my row cache was only 500 MB big. So I got worried on how this will affect Heap space when I manage to fill the entire off-heap. Is there a formular that maps heap space to off-heap? Or more spicifically how much heap space does it require to manage 3 GB off-heap space? thanks! artur
Rowcache and quorum reads cassandra
I was reading through configuration tips for cassandra and decided to use row-cache in order to optimize the read performance on my cluster. I have a cluster of 10 nodes, each of them opeartion with 3 GB off-heap using cassandra 2.4.1. I am doing local quorum reads, which means that I will hit 3 nodes out of 5 because I split my 10 nodes into two data-centres. I was under the impression that since each node gets a certain range of reads my total amount of off-heap would be 10 * 3 GB = 30 GB. However is this still correct with quorum reads? How does cassandra handle row-cache hits in combination with quorum reads? Thanks! -- artur
Re: Rowcache and quorum reads cassandra
Hi. That is basically our set up. We'll be holding all data on all nodes. My problem was more on how the cache would behave. I thought it might go this way: 1. No cache hit Read from 3 nodes to verify results are correct and then return. Write result into RowCache. 2. Cache hit Read from Cache directly and return. If now the value gets updated it would be found in the RowCache and either invalidated (hence case 1 on next read) or updated (hence case 2 on next read). However I couldn't find any information on this. If this was the case it would mean that each node would only have to hold 1/5 of my data in Cache (you're right about the DC clone so 1/5 of data instead of 1/10). If however 3 nodes have to be read each time and all 3 fill up the row cache with the same data that would make my cache requirements bigger. Thanks! Artur On 10/10/13 14:06, Ken Hancock wrote: If you're hitting 3/5 nodes, it sounds like you've set your replication factor to 5. Is that what you're doing so you can have a 2-node outtage? For a 5-node cluster, RF=5, each node will have 100% of your data (a second DC is just a clone), so with a 3GB off-heap it means that 3GB / total data size in GB total would be cacheable in the row cache. On the other hand, if you're doing RF=3, each node will have 60% of your data instead of 100% so the effective percentage of rows that are cache goes up by 66%. Great quick dirty caclulator: http://www.ecyrd.com/cassandracalculator/ On Thu, Oct 10, 2013 at 6:40 AM, Artur Kronenberg artur.kronenb...@openmarket.com mailto:artur.kronenb...@openmarket.com wrote: I was reading through configuration tips for cassandra and decided to use row-cache in order to optimize the read performance on my cluster. I have a cluster of 10 nodes, each of them opeartion with 3 GB off-heap using cassandra 2.4.1. I am doing local quorum reads, which means that I will hit 3 nodes out of 5 because I split my 10 nodes into two data-centres. I was under the impression that since each node gets a certain range of reads my total amount of off-heap would be 10 * 3 GB = 30 GB. However is this still correct with quorum reads? How does cassandra handle row-cache hits in combination with quorum reads? Thanks! -- artur -- **Ken Hancock **| System Architect, Advanced Advertising SeaChange International 50 Nagog Park Acton, Massachusetts 01720 ken.hanc...@schange.com mailto:ken.hanc...@schange.com | www.schange.com http://www.schange.com/ | NASDAQ:SEAC http://www.schange.com/en-US/Company/InvestorRelations.aspx Office: +1 (978) 889-3329 | Google Talk: ken.hanc...@schange.com mailto:ken.hanc...@schange.com | Skype:hancockks | Yahoo IM:hancockks LinkedIn http://www.linkedin.com/in/kenhancock SeaChange International http://www.schange.com/ This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.