Re: 1.2.2 as primary storage?
Michael Kjellman writes: > How big will each mutation be roughly? 1MB, 5MB, 16MB? On the small end. Say 1MB. Cheers, Chris Dean
1.2.2 as primary storage?
I've been away from Cassandra for a while and wondered what the consensus is on using 1.2.2 as a primary data store? Our app has a typical OLTP workload but we have high availability requirements. The data set is just under 1TB and I don't see us growing to more that a small Cassandra cluster. I have run 0.7.0 on a 3 node cluster in production and that was fine, but it was a different sort of application. Thanks! (FWIW, our second choice would be to run PG and shard at the app level.) Cheers, Chris Dean
Re: What happens if there is a collision?
Peter Schuller writes: >> The timestamp is an ever increasing clock so I wouldn't expect two api >> calls from the same machine in the same thread to have the same >> timestamp. It is perfectly allowed behavior for the read value to not >> agree with the write value. > > In the *particular* case of a single instantiation of a client I would > tend to expect it to actually guarantee strictly increasing time just > as a matter of thread-local consistency so that a single flow of > control can assume that writes will happen in the order in which they > are executed. (Is this actually the case for current high-level > clients?) > > But of course, there is no such guarantee in the distributed sense > either way. The point is in reply to this message: Jérôme Verstrynge writes: > You are making my point (lol). No matter what an application writes, > it should re-read its owns write for determinism for a given timestamp > when other application instances are writing in the same 'table'. There is no such situation in Cassandra. An application may read things differently than it writes. You may not hold the timestamp constant and use that as a sort of locking mechanism. The timestamp is an every increasing clock. Cheers, Chris Dean
Re: What happens if there is a collision?
Jérôme Verstrynge writes: > You are making my point (lol). No matter what an application writes, > it should re-read its owns write for determinism for a given timestamp > when other application instances are writing in the same 'table'. The timestamp is an ever increasing clock so I wouldn't expect two api calls from the same machine in the same thread to have the same timestamp. It is perfectly allowed behavior for the read value to not agree with the write value. Cheers, Chris Dean
Re: ec2 tests
> @Chris, Did you get any bench you could share with us? We're still working on it. It's a lower priority task so it will take a while to finish. So far we've run on all the AWS data centers in the US and used several different setups. We also did a test on Rackspace with one setup and some whitebox servers we had in the office. (The whitebox servers are still running I believe.) I don't have the numbers here, but the fastest by far is the non-virtualized whitebox servers. No real surprise. Rackspace was faster than AWS US-West; US-West faster than the than US-East. We always use 3 Cassandra servers and one or two machines to run stress.py. I don't think we're seeing the 7500 writes/sec so maybe our config is wrong. You'll have to be patient until my colleague writes this all up. Cheers, Chris Dean
Re: ec2 tests
Mark Greene writes: > If you give us an objective of the test that will help. Trying to get max > write throughput? Read throughput? Weak consistency? I would like reading to be as fast as I can get. My real-world problem is write heavy, but the latency requirements are minimal on that side. If there are any particular config setting that would help with the slow ec2 IO that would be great to know. Cheers, Chris Dean
ec2 tests
I'm interested in performing some simple performance tests on EC2. I was thinking of using py_stress and Cassandra deployed on 3 servers with one separate machine to run py_stress. Are there any particular configuration settings I should use? I was planning on changing the JVM heap size to reflect the Large Instances we're using. Thanks! Cheers, Chris Dean
Getting all the keys from a ColumnFamily ?
I have a ColumnFamily with a small number of keys, but each key has a large number of columns. What's the best way to get just the keys back? I don't want to load all the columns if I don't have to. There also isn't necessarily any column names in common between the different rows. Cheers, Chris Dean
Using get_range_slices
I'd like to use get_range_slices to pull all the keys from a small CF with 10,000 keys. I'd also like to get them in chunks of 100 at a time. Is there a way to do that? I thought I could set start_token and end_token in KeyRange, but I can't figure out what the intial start_token should be. Cheers, Chris Dean
Re: get_range_slices in hector
Ok, thanks. Cheers, Chris Dean Nathan McCall writes: > Not yet. If you wanted to provide a patch that would be much > appreciated. A fork and pull request would be best logistically, but > whatever works. > > -Nate > > On Mon, Apr 19, 2010 at 5:10 PM, Chris Dean wrote: >> Is there a version of hector that has an interface to get_range_slices ? >> or should I provide a patch? >> >> Cheers, >> Chris Dean >>
get_range_slices in hector
Is there a version of hector that has an interface to get_range_slices ? or should I provide a patch? Cheers, Chris Dean