Re: 1.2.2 as primary storage?

2013-02-25 Thread Chris Dean
Michael Kjellman  writes:
> How big will each mutation be roughly? 1MB, 5MB, 16MB?

On the small end.  Say 1MB.

Cheers,
Chris Dean


1.2.2 as primary storage?

2013-02-25 Thread Chris Dean
I've been away from Cassandra for a while and wondered what the
consensus is on using 1.2.2 as a primary data store?  

Our app has a typical OLTP workload but we have high availability
requirements.  The data set is just under 1TB and I don't see us growing
to more that a small Cassandra cluster.

I have run 0.7.0 on a 3 node cluster in production and that was fine,
but it was a different sort of application.

Thanks!

(FWIW, our second choice would be to run PG and shard at the app level.)

Cheers,
Chris Dean


Re: What happens if there is a collision?

2010-10-25 Thread Chris Dean
Peter Schuller  writes:
>> The timestamp is an ever increasing clock so I wouldn't expect two api
>> calls from the same machine in the same thread to have the same
>> timestamp.  It is perfectly allowed behavior for the read value to not
>> agree with the write value.
>
> In the *particular* case of a single instantiation of a client I would
> tend to expect it to actually guarantee strictly increasing time just
> as a matter of thread-local consistency so that a single flow of
> control can assume that writes will happen in the order in which they
> are executed. (Is this actually the case for current high-level
> clients?)
>
> But of course, there is no such guarantee in the distributed sense
> either way.

The point is in reply to this message:

Jérôme Verstrynge  writes:
> You are making my point (lol). No matter what an application writes,
> it should re-read its owns write for determinism for a given timestamp
> when other application instances are writing in the same 'table'.

There is no such situation in Cassandra.  An application may read things
differently than it writes.  You may not hold the timestamp constant and
use that as a sort of locking mechanism.  The timestamp is an every
increasing clock.

Cheers,
Chris Dean


Re: What happens if there is a collision?

2010-10-21 Thread Chris Dean
Jérôme Verstrynge  writes:
> You are making my point (lol). No matter what an application writes,
> it should re-read its owns write for determinism for a given timestamp
> when other application instances are writing in the same 'table'.

The timestamp is an ever increasing clock so I wouldn't expect two api
calls from the same machine in the same thread to have the same
timestamp.  It is perfectly allowed behavior for the read value to not
agree with the write value.

Cheers,
Chris Dean


Re: ec2 tests

2010-06-18 Thread Chris Dean
> @Chris, Did you get any bench you could share with us?

We're still working on it.  It's a lower priority task so it will take a
while to finish.  So far we've run on all the AWS data centers in the US
and used several different setups.  We also did a test on Rackspace with
one setup and some whitebox servers we had in the office.  (The whitebox
servers are still running I believe.)

I don't have the numbers here, but the fastest by far is the
non-virtualized whitebox servers.  No real surprise.  Rackspace was
faster than AWS US-West; US-West faster than the than US-East.  

We always use 3 Cassandra servers and one or two machines to run
stress.py.  I don't think we're seeing the 7500 writes/sec so maybe our
config is wrong.  You'll have to be patient until my colleague writes
this all up.

Cheers,
Chris Dean


Re: ec2 tests

2010-05-28 Thread Chris Dean
Mark Greene  writes:
> If you give us an objective of the test that will help. Trying to get max
> write throughput? Read throughput? Weak consistency?

I would like reading to be as fast as I can get.  My real-world problem
is write heavy, but the latency requirements are minimal on that side.
If there are any particular config setting that would help with the slow
ec2 IO that would be great to know.

Cheers,
Chris Dean


ec2 tests

2010-05-27 Thread Chris Dean
I'm interested in performing some simple performance tests on EC2.  I
was thinking of using py_stress and Cassandra deployed on 3 servers with
one separate machine to run py_stress.

Are there any particular configuration settings I should use?  I was
planning on changing the JVM heap size to reflect the Large Instances
we're using.

Thanks!

Cheers,
Chris Dean


Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Chris Dean
I have a ColumnFamily with a small number of keys, but each key has a
large number of columns.

What's the best way to get just the keys back?  I don't want to load all
the columns if I don't have to.  There also isn't necessarily any column
names in common between the different rows.

Cheers,
Chris Dean


Using get_range_slices

2010-04-20 Thread Chris Dean
I'd like to use get_range_slices to pull all the keys from a small CF
with 10,000 keys.  I'd also like to get them in chunks of 100 at a time.
Is there a way to do that?

I thought I could set start_token and end_token in KeyRange, but I can't
figure out what the intial start_token should be.

Cheers,
Chris Dean


Re: get_range_slices in hector

2010-04-19 Thread Chris Dean
Ok, thanks.

Cheers,
Chris Dean

Nathan McCall  writes:
> Not yet. If you wanted to provide a patch that would be much
> appreciated. A fork and pull request would be best logistically, but
> whatever works.
>
> -Nate
>
> On Mon, Apr 19, 2010 at 5:10 PM, Chris Dean  wrote:
>> Is there a version of hector that has an interface to get_range_slices ?
>> or should I provide a patch?
>>
>> Cheers,
>> Chris Dean
>>


get_range_slices in hector

2010-04-19 Thread Chris Dean
Is there a version of hector that has an interface to get_range_slices ?
or should I provide a patch?

Cheers,
Chris Dean