Re: Time series data and deletion
Il 2013/03/11 14:42 PM, aaron morton ha scritto: I'm trying to understand what will happen when we start deleting the old data. Are you going to delete data or use the TTL? We delete the data explicitly, since we might change idea on the data TTL after it has been written. With size tiered compaction, suppose we have one 160Gb sstable and some smaller tables totalling 40Gb. Not sure on that, it depends on the work load. NVM, was just a hypothesis. My understanding is that, even if we start deleting, we will have to wait for 3 more 160Gb tables to appear, in order to have the first sstable compacted and the disk space freed. v1.2 will run compactions on single SSTables that have a high number of tombstones https://issues.apache.org/jira/browse/CASSANDRA-3442 https://issues.apache.org/jira/browse/CASSANDRA-4234 I did not know about these improvements in 1.2! We're still on 1.0.12, I'll push for an upgrade. One more question. I read and reread your description of deletes [1], but I still am confused on tombstones and GCGraceSeconds, specifically when you say If the deletion is before gcBefore it is totally ignored. Suppose I delete something, but compaction between the tombstone and the deleted data does not happen within GCGraceSeconds. From what I understood, it looks like the tombstone will be ignored, and the data will resurrect... where am I wrong? Cheers Flavio [1] http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/#local_reads_for_local_queries
Time series data and deletion
Hello, we are using Cassandra for storing time series data. We never update, only append; we plan to store 1 year worth of data, occupying something around 200Gb. I'm trying to understand what will happen when we start deleting the old data. With size tiered compaction, suppose we have one 160Gb sstable and some smaller tables totalling 40Gb. My understanding is that, even if we start deleting, we will have to wait for 3 more 160Gb tables to appear, in order to have the first sstable compacted and the disk space freed. So although we need to store 200Gb worth of data, we'll need something like 800Gb disk space in order to be on the safe side, right? What would happen instead with leveled compaction? And why is the default sstable size so small (5Mb)? If we need to store 200Gb, this means we will have 40k sstables; since each one makes 5 files, we'll have 200k files in a single directory, which we'm afraid will undermine the stability of the file system. Thank you for your suggestions! Flavio
Re: 1000's of column families
We had some serious trouble with dynamically adding CFs, although last time we tried we were using version 0.7, so maybe that's not an issue any more. Our problems were two: - You are (were?) not supposed to add CFs concurrently. Since we had more servers talking to the same Cassandra cluster, we had to use distributed locks (Hazelcast) to avoid concurrency. - You must be very careful to add new CFs to different Cassandra nodes. If you do that fast enough, and the clocks of the two servers are skewed, you will severely compromise your schema (Cassandra will not understand in which order the updates must be applied). As I said, this applied to version 0.7, maybe current versions solved these problems. Flavio Il 2012/09/27 16:11 PM, Hiller, Dean ha scritto: We have 1000's of different building devices and we stream data from these devices. The format and data from each one varies so one device has temperature at timeX with some other variables, another device has CO2 percentage and other variables. Every device is unique and streams it's own data. We dynamically discover devices and register them. Basically, one CF or table per thing really makes sense in this environment. While we could try to find out which devices are similar, this would really be a pain and some devices add some new variable into the equation. NOT only that but researchers can register new datasets and upload them as well and each dataset they have they do NOT want to share with other researches necessarily so we have security groups and each CF belongs to security groups. We dynamically create CF's on the fly as people register new datasets. On top of that, when the data sets get too large, we probably want to partition a single CF into time partitions. We could create one CF and put all the data and have a partition per device, but then a time partition will contain multiple devices of data meaning we need to shrink our time partition size where if we have CF per device, the time partition can be larger as it is only for that one device. THEN, on top of that, we have a meta CF for these devices so some people want to query for streams that match criteria AND which returns a CF name and they query that CF name so we almost need a query with variables like select cfName from Meta where x = y and then select * from cfName where x. Which we can do today. Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, September 27, 2012 8:01 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need to store different kinds of data. As Cassandra stores anything in each CF, it might probably make sense to have a lot of CFs to store your data... But why wouldn't you use a single CF with partitions in these case? Wouldn't it be the same thing? I am asking because I might learn a new modeling technique with the answer. []s 2012/9/26 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as you want, correct? Even though I am going to have tons of funs with limitations on the tools, correct? (I may end up wrapping the node tool with my own aggregate calls if needed to sum up multiple column families and such). Thanks, Dean -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Tiered compation on two disks
Hi, I have a Cassandra installation where we plan to store 1Tb of data, split between two 1Tb disks. Tiered compation should be better suited for our workload (append-only, deletion of old data, few reads). I know that tiered compaction needs 50% free disk space for worst case situation. How does this combine with the disk split? What happens if I have 500Gb of data in one disk and 500Gb in the other? Won't compaction try to build a single 1Tb file, failing since there are only 500Gb free on each disk? Flavio
Re: Starting cassandra with -D option
The option must actually include also the name of the yaml file: Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf/cassandra.yaml Flavio Il 6/21/2012 13:16 PM, Roshni Rajagopal ha scritto: Hi Folks, We wanted to have a single cassandra installation, and use it to start cassandra in other nodes by passing it the cassandra configuration directories as a parameter. Idea is to avoid having the copies of cassandra code in each node, and starting each node by getting into bin/cassandra of that node. As per http://www.datastax.com/docs/1.0/references/cassandra, We have an option –D where we can supply some parameters to cassandra. Has anyone tried this? Im getting an error as below. walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$ bin/cassandra -Dcassandra.config=file:///Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf walmarts-MacBook-Pro-2:Node1-Cassandra1.1.0 walmart$ INFO 15:38:01,763 Logging initialized INFO 15:38:01,766 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_31 INFO 15:38:01,766 Heap size: 1052770304/1052770304 INFO 15:38:01,766 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.1.0.jar:bin/../lib/apache-cassandra-clientutil-1.1.0.jar:bin/../lib/apache-cassandra-thrift-1.1.0.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/metrics-core-2.0.3.jar:bin/../lib/mx4j-tools-3.0.1.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar INFO 15:38:01,768 JNA not found. Native methods will be disabled. INFO 15:38:01,826 Loading settings from file:/Users/walmart/Downloads/Cassandra/Node2-Cassandra1.1.0/conf ERROR 15:38:01,873 Fatal configuration error error Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=No single argument constructor found for class org.apache.cassandra.config.Config in reader, line 1, column 1: cassandra.yaml The other option would be to modify cassandra.in.sh. Has anyone tried this?? Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: JMX APIs thru MX4J
Hi, I can't help you with your other questions, but the type [Ljava.lang.String; is an array of String objects. More info here http://en.wikipedia.org/wiki/Java_Native_Interface#Mapping_types Flavio Il 4/4/2012 10:04 AM, Andrea Tuccia ha scritto: Hello, I'm working on a fork of Sébastien Giroux Cassandra Cluster Admin and I wish to pull my contributions to the parent code. Here is my repository: https://github.com/atuccia/Cassandra-Cluster-Admin ...and Sébastien Giroux one: https://github.com/sebgiroux/Cassandra-Cluster-Admin I wanna add the same functionalities of OpsCenter (or the common functions available thru the command line nodetool) as cleanup, compact, repair, drain, decommission and so on... I'm stuck, gone in trouble with MX4J. http://192.168.10.91/invoke?operation=forceTableRepairobjectname=org.apache.cassandra.db%3Atype%3DStorageServicevalue0=type0=java.lang.Stringtype1=[Ljava.lang.String%3B MBean operation: invoke method on MBean org.apache.cassandra.db:type=StorageService Error during MBean operation invocationMessage: count of parameter types doesn't match count of parameter values http://192.168.10.91/invoke?operation=forceTableRepairobjectname=org.apache.cassandra.db%3Atype%3DStorageServicevalue0=type0=java.lang.Stringtype1=[Ljava.lang.String%3Bvalue1= MBean operation: invoke method on MBean org.apache.cassandra.db:type=StorageService Error during MBean operation invocationMessage: Parameter 1: cannot be converted to type [Ljava.lang.String; http://192.168.10.91/invoke?operation=forceTableRepairobjectname=org.apache.cassandra.db%3Atype%3DStorageService MBean operation: invoke method on MBean org.apache.cassandra.db:type=StorageService Error during MBean operation invocationMessage: Operation singature has no match in the MBean ...so, how can I call the APIs (apparently) with optional parameters? What is that strange definition of the type [Ljava.lang.String; for some params? Thanks in advance! Andrea Tuccia
List all keys with RandomPartitioner
I need to iterate over all the rows in a column family stored with RandomPartitioner. When I reach the end of a key slice, I need to find the token of the last key in order to ask for the next slice. I saw in an old email that the token for a specific key can be recoveder through FBUtilities.hash(). That class however is inside the full Cassandra jar, not inside the client-specific part. Is there a way to iterate over all the keys which does not require the server-side Cassandra jar? Thanks Flavio
Re: List all keys with RandomPartitioner
Il 2/22/2012 12:24 PM, Franc Carter ha scritto: On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti f.baro...@list-group.com mailto:f.baro...@list-group.com wrote: I need to iterate over all the rows in a column family stored with RandomPartitioner. When I reach the end of a key slice, I need to find the token of the last key in order to ask for the next slice. I saw in an old email that the token for a specific key can be recoveder through FBUtilities.hash(). That class however is inside the full Cassandra jar, not inside the client-specific part. Is there a way to iterate over all the keys which does not require the server-side Cassandra jar? Does this help ? http://wiki.apache.org/cassandra/FAQ#iter_world cheers Looks good... I thought you were not supposed to iterate directly over row keys with a RandomPartitioner! Thanks Flavio
Re: Second Cassandra users survey
We are using Cassandra for time series storage. Strong points: write performance. Pain points: dinamically adding column families as new time series come in. Caused a lot of headaches, mismatchers between nodes, etc. In the end we just put everything together in a single (huge) column family. Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to provide viewers for custom data. Il 11/1/2011 23:59 PM, Jonathan Ellis ha scritto: Hi all, Two years ago I asked for Cassandra use cases and feature requests. [1] The results [2] have been extremely useful in setting and prioritizing goals for Cassandra development. But with the release of 1.0 we've accomplished basically everything from our original wish list. [3] I'd love to hear from modern Cassandra users again, especially if you're usually a quiet lurker. What does Cassandra do well? What are your pain points? What's your feature wish list? As before, if you're in stealth mode or don't want to say anything in public, feel free to reply to me privately and I will keep it off the record. [1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html [2] http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
ConsistencyLevel and write availability
Suppose i have a cluster with 10 nodes and RF=5. Will every write succeed if one or two of my nodes are down, and I use ConsistencyLevel=ALL? Or will some of the writes fail?
OOM recovering failed node with many CFs
I can't seem to be able to recover a failed node on a database where i did many updates to the schema. I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but it can't be changed right now), and ReplicationFactor=2. I shut down a node and cleaned its data entirely, then tried to bring it back up. The node starts fetching schema updates from the live node, but the operation fails halfway with an OOME. After some investigation, what I found is that: - I have a lot of schema updates (there are 2067 rows in the system.Schema CF). - The live node loads migrations 1-1000, and sends them to the recovering node (Migration.getLocalMigrations()) - Soon afterwards, the live node checks the schema version on the recovering node and finds it has moved by a little - say it has applied the first 3 migrations. It then loads migrations 3-1003, and sends them to the node. - This process is repeated very quickly (sends migrations 6-1006, 9-1009, etc). Analyzing the memory dump and the logs, it looks like each of these 1000 migration blocks are composed in a single message and sent to the OutboundTcpConnection queue. However, since the schema is big, the messages occupy a lot of space, and are built faster than the connection can send them. Therefore, they accumulate in OutboundTcpConnection.queue, until memory is completely filled. Any suggestions? Can I change something to make this work, apart from reducing the number of CFs? Flavio
Re: OOM recovering failed node with many CFs
I tried the manual copy you suggest, but the SystemTable.checkHealth() function complains it can't load the system files. Log follows, I will gather some more info and create a ticket as soon as possible. INFO [main] 2011-05-26 18:25:36,147 AbstractCassandraDaemon.java Logging initialized INFO [main] 2011-05-26 18:25:36,172 AbstractCassandraDaemon.java Heap size: 4277534720/4277534720 INFO [main] 2011-05-26 18:25:36,174 CLibrary.java JNA not found. Native methods will be disabled. INFO [main] 2011-05-26 18:25:36,190 DatabaseDescriptor.java Loading settings from file:/C:/Cassandra/conf/hscassandra9170.yaml INFO [main] 2011-05-26 18:25:36,344 DatabaseDescriptor.java DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2011-05-26 18:25:36,532 SSTableReader.java Opening G:\Cassandra\data\system\Schema-f-2746 INFO [main] 2011-05-26 18:25:36,577 SSTableReader.java Opening G:\Cassandra\data\system\Schema-f-2729 INFO [main] 2011-05-26 18:25:36,590 SSTableReader.java Opening G:\Cassandra\data\system\Schema-f-2745 INFO [main] 2011-05-26 18:25:36,599 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-2167 INFO [main] 2011-05-26 18:25:36,600 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-2131 INFO [main] 2011-05-26 18:25:36,602 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-1041 INFO [main] 2011-05-26 18:25:36,603 SSTableReader.java Opening G:\Cassandra\data\system\Migrations-f-1695 ERROR [main] 2011-05-26 18:25:36,634 AbstractCassandraDaemon.java Fatal exception during initialization org.apache.cassandra.config.ConfigurationException: Found system table files, but they couldn't be loaded. Did you change the partitioner? at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:236) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) Il 5/26/2011 6:04 PM, Jonathan Ellis ha scritto: Sounds like a legitimate bug, although looking through the code I'm not sure what would cause a tight retry loop on migration announce/rectify. Can you create a ticket at https://issues.apache.org/jira/browse/CASSANDRA ? As a workaround, I would try manually copying the Migrations and Schema sstable files from the system keyspace of the live node, then restart the recovering one. On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti f.baro...@list-group.com wrote: I can't seem to be able to recover a failed node on a database where i did many updates to the schema. I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but it can't be changed right now), and ReplicationFactor=2. I shut down a node and cleaned its data entirely, then tried to bring it back up. The node starts fetching schema updates from the live node, but the operation fails halfway with an OOME. After some investigation, what I found is that: - I have a lot of schema updates (there are 2067 rows in the system.Schema CF). - The live node loads migrations 1-1000, and sends them to the recovering node (Migration.getLocalMigrations()) - Soon afterwards, the live node checks the schema version on the recovering node and finds it has moved by a little - say it has applied the first 3 migrations. It then loads migrations 3-1003, and sends them to the node. - This process is repeated very quickly (sends migrations 6-1006, 9-1009, etc). Analyzing the memory dump and the logs, it looks like each of these 1000 migration blocks are composed in a single message and sent to the OutboundTcpConnection queue. However, since the schema is big, the messages occupy a lot of space, and are built faster than the connection can send them. Therefore, they accumulate in OutboundTcpConnection.queue, until memory is completely filled. Any suggestions? Can I change something to make this work, apart from reducing the number of CFs? Flavio