Re: Read Latency

2010-10-19 Thread Aaron Morton
Not sure how pycassa does it, but it a simple case of...- get_slice with start="", finish="" and count = 100,001- pop the last column and store it's name- get_slice with start as the last column name, finish="" and count = 100,001repeat. AOn 20 Oct, 2010,at 03:08 PM, Wayne  wrote:Thanks for all of the feedback. I may not very well be doing a deep copy, so my numbers might not be accurate. I will test with writing to/from the disk to verify how long native python takes. I will also check how large the data is coming from cassandra is in size for comparison.
Our high expectations are based on actual MySQL time which is in the range of 3-4 seconds for the exact same data. I will also try to work with getting the data in batches. Not as easy of course in Cassandra, which is probably why we have not tried that yet.
Thanks for all of the feedback!On Tue, Oct 19, 2010 at 8:51 PM, Aaron Morton  wrote:
Hard to say why your code performs that way, it may not be creating as many objects for example strings may not be re-created just referenced. Are your creating new objects for every column returned?
Bring 600,000 to 10M columns back at once is always going to take time. I think any python database client would take a while to create objects for 600,000 rows. Do you have an example of pulling 600,000 rows through MySQL into python to compare against?
Is it possible to break up the get_slice into chunks of 10,000 or 100,000? IMHO you will get more consistent performance if you bound the requests, so you have an idea of the upper level of latency for each request and create a more consistent memory footprint. 
For example in the rough test below, 100,000 objects takes 0.75 secs but 600,000 takes 13. As an example of reprocessing the results, i called go2 with the output of go below.
def go2(buffer):    start = timetime()    buffer2 = [        {"name" : csc.column.name, "value" : csc.column.value}
        for csc in buffer    ]    print "Done2 in %s" % (time.time() -start){977} > python decode_test.py 10Done in 0.75460100174
Done2 in 0.314303874969 {978} > python decode_test.py 60Done in 13.2945489883Done2 in 7.32861185074My general advice is to pull back less data in a single request. 
AaronOn 20 Oct, 2010,at 11:30 AM, Wayne  wrote:
I am not sure how many bytes, but we do convert the cassandra object that is returned in 3s into a dictionary in ~1s and then again into a custom python object in about ~1.5s. Expectations are based on this timing. If we can convert what thrift returns into a completely new python object in 1s why does thrift need 3s to give it to us?

To us it is like the MySQL client we use in python. It is really C wrapped in python and adds almost zero overhead to the time it takes mysql to return the data. That is the expectation we have and the performance we are looking to get to. Disk I/O + 20%.

We are returning one big row and this is not our normal use case but a requirement for us to use Cassandra. We need to get all data for a specific value, as this is a secondary index. It is like getting all users in the state of CA. CA is the key and there is a column for every user id. We are testing with 600,000 but this will grow to 10+ million in the future.

We can not test .7 as we are only using .6.6. We are trying to evaluate Cassandra and stability is one concern so .7 is definitely not for us at this point.Thanks.On Tue, Oct 19, 2010 at 4:27 PM, Aaron Morton  wrote:

 Just wondering how many bytes you are returning to the client to get an idea of how slow it is. 

The call to fastbinary is decoding the wireformat and creating the Python objects. When you ask for 600,000 columns your are creating a lot of python objects. Each column will be a ColumnOrSuperColumn, wrapping a Column, which has probably 2 Strings. So 2.4 million python objects.

Here's  my rough test script. def go(count):    start = time.time()    buffer = [        ttypesColumnOrSuperColumn(column=ttypes.Column(

            "column_name_%s" % i, "row_size of something something", 0, 0))        for i in range(count)    ]    print "Done in %s" % (time.time() - start)

On my machine that takes 13 seconds for 600,000 and 0.04 for 10,000. The fastbinary module is running a lot faster because it's all in c.  It's not a great test but I think it gives an idea of what you are asking for

I think there is an element of python been slower than other languages. But IMHO you are asking for a lot of data. Can you ask for less data? Out of interest are you able to try the avro client? It's still experimental (0.7 only) but may give you something to compare it against. 

AaronOn 20 Oct, 2010,at 07:23 AM, Wayne  wrote:

It is an entire row which is 600,000 cols. We pass a limit of 10million to make sure we get it all. Our issue is that it seems Thrift itself has more overhead/latency added to a read that Cassandra takes itself to do the read. If cfstats fo

Re: Read Latency

2010-10-19 Thread Wayne
Thanks for all of the feedback. I may not very well be doing a deep copy, so
my numbers might not be accurate. I will test with writing to/from the disk
to verify how long native python takes. I will also check how large the data
is coming from cassandra is in size for comparison.

Our high expectations are based on actual MySQL time which is in the range
of 3-4 seconds for the exact same data.

I will also try to work with getting the data in batches. Not as easy of
course in Cassandra, which is probably why we have not tried that yet.

Thanks for all of the feedback!


On Tue, Oct 19, 2010 at 8:51 PM, Aaron Morton wrote:

> Hard to say why your code performs that way, it may not be creating as many
> objects for example strings may not be re-created just referenced. Are your
> creating new objects for every column returned?
>
> Bring 600,000 to 10M columns back at once is always going to take time. I
> think any python database client would take a while to create objects for
> 600,000 rows. Do you have an example of pulling 600,000 rows through MySQL
> into python to compare against?
>
> Is it possible to break up the get_slice into chunks of 10,000 or 100,000?
> IMHO you will get more consistent performance if you bound the requests, so
> you have an idea of the upper level of latency for each request and create a
> more consistent memory footprint.
>
> For example in the rough test below, 100,000 objects takes 0.75 secs but
> 600,000 takes 13.
>
> As an example of reprocessing the results, i called go2 with the output of
> go below.
>
> def go2(buffer):
> start = timetime()
> buffer2 = [
> {"name" : csc.column.name, "value" : csc.column.value}
> for csc in buffer
> ]
> print "Done2 in %s" % (time.time() -start)
>
> {977} > python decode_test.py 10
> Done in 0.75460100174
> Done2 in 0.314303874969
>
>  {978} > python decode_test.py 60
> Done in 13.2945489883
> Done2 in 7.32861185074
>
> My general advice is to pull back less data in a single request.
>
> Aaron
>
> On 20 Oct, 2010,at 11:30 AM, Wayne  wrote:
>
> I am not sure how many bytes, but we do convert the cassandra object that
> is returned in 3s into a dictionary in ~1s and then again into a custom
> python object in about ~1.5s. Expectations are based on this timing. If we
> can convert what thrift returns into a completely new python object in 1s
> why does thrift need 3s to give it to us?
>
> To us it is like the MySQL client we use in python. It is really C wrapped
> in python and adds almost zero overhead to the time it takes mysql to return
> the data. That is the expectation we have and the performance we are looking
> to get to. Disk I/O + 20%.
>
> We are returning one big row and this is not our normal use case but a
> requirement for us to use Cassandra. We need to get all data for a specific
> value, as this is a secondary index. It is like getting all users in the
> state of CA. CA is the key and there is a column for every user id. We are
> testing with 600,000 but this will grow to 10+ million in the future.
>
> We can not test .7 as we are only using .6.6. We are trying to evaluate
> Cassandra and stability is one concern so .7 is definitely not for us at
> this point.
>
> Thanks.
>
>
> On Tue, Oct 19, 2010 at 4:27 PM, Aaron Morton wrote:
>
>>
>>  Just wondering how many bytes you are returning to the client to get an
>> idea of how slow it is.
>>
>> The call to fastbinary is decoding the wireformat and creating the Python
>> objects. When you ask for 600,000 columns your are creating a lot of python
>> objects. Each column will be a ColumnOrSuperColumn, wrapping a Column, which
>> has probably 2 Strings. So 2.4 million python objects.
>>
>> Here's  my rough test script.
>>
>> def go(count):
>> start = time.time()
>> buffer = [
>> ttypes.ColumnOrSuperColumn(column=ttypes.Column(
>> "column_name_%s" % i, "row_size of something something", 0,
>> 0))
>> for i in range(count)
>> ]
>> print "Done in %s" % (time.time() - start)
>>
>> On my machine that takes 13 seconds for 600,000 and 0.04 for 10,000. The
>> fastbinary module is running a lot faster because it's all in c.  It's not a
>> great test but I think it gives an idea of what you are asking for.
>>
>> I think there is an element of python been slower than other languages.
>> But IMHO you are asking for a lot of data. Can you ask for less data?
>>
>> Out of interest are you able to try the avro client? It's still
>> experimental (0.7 only) but may give you something to compare it against.
>>
>> Aaron
>>
>> On 20 Oct, 2010,at 07:23 AM, Wayne  wrote:
>>
>>
>> It is an entire row which is 600,000 cols. We pass a limit of 10million to
>> make sure we get it all. Our issue is that it seems Thrift itself has more
>> overhead/latency added to a read that Cassandra takes itself to do the read.
>> If cfstats for the slowest node reports 2.25s to us it is not acceptable
>> that the data comes back to the cli

Re: bootstrap question

2010-10-19 Thread Jonathan Ellis
I think this code has had some changes since beta2.  Here is what it
looks like in trunk:

if (DatabaseDescriptor.getNonSystemTables().size() > 0)
{
bootstrap(token);
assert !isBootstrapMode; // bootstrap will block until finished
}
else
{
isBootstrapMode = false;
SystemTable.setBootstrapped(true);
tokenMetadata_.updateNormalToken(token,
FBUtilities.getLocalAddress());

Gossiper.instance.addLocalApplicationState(ApplicationState.STATUS,
valueFactory.normal(token));
setMode("Normal", false);
}

bootstrap means "stream over data from the other nodes that is
scheduled to become my responsibility."  so what the code you're
referring to is saying is, "if there are no keyspaces defined [in the
cluster we're joining, not yaml], just join the ring immediately."

On Tue, Oct 19, 2010 at 7:47 PM, Yang  wrote:
> from line 396 of StorageService.java from the 0.7.0-beta2 source, it
> looks that when I boot up a completely new node,
> if there is not any keyspace defined in its storage.yaml,  it would
> not even participate in the ring?
>
> in other words, let's say the cassandra instance currently has 10
> nodes, and hosts data for 4 keyspaces,  now the 11th node is added to
> the system,
> in order to shed load from the existing 10 nodes, I have to define a
> dummy keyspace for the new node to trigger the startBootstrap() code
> below?
>
>
> sorry for the newbie question, just starting out to dig into the code 
>
> Thanks
> Yang
>
>
>            // don't bootstrap if there are no tables defined.
>            if (DatabaseDescriptor.getNonSystemTables().size() > 0)
>                startBootstrap(token);
>            else
>            {
>                isBootstrapMode = false;
>                SystemTable.setBootstrapped(true);
>                tokenMetadata_.updateNormalToken(token,
> FBUtilities.getLocalAddress());
>
> Gossiper.instance.addLocalApplicationState(ApplicationState.STATUS,
> valueFactory.normal(token));
>                setMode("Normal", false);
>            }
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Read Latency

2010-10-19 Thread Aaron Morton
Hard to say why your code performs that way, it may not be creating as many objects for example strings may not be re-created just referenced. Are your creating new objects for every column returned?Bring 600,000 to 10M columns back at once is always going to take time. I think any python database client would take a while to create objects for 600,000 rows. Do you have an example of pulling 600,000 rows through MySQL into python to compare against?Is it possible to break up the get_slice into chunks of 10,000 or 100,000? IMHO you will get more consistent performance if you bound the requests, so you have an idea of the upper level of latency for each request and create a more consistent memory footprint. For example in the rough test below, 100,000 objects takes 0.75 secs but 600,000 takes 13. As an example of reprocessing the results, i called go2 with the output of go below.def go2(buffer):    start = timetime()    buffer2 = [        {"name" : csc.column.name, "value" : csc.column.value}        for csc in buffer    ]    print "Done2 in %s" % (time.time() -start){977} > python decode_test.py 10Done in 0.75460100174Done2 in 0.314303874969 {978} > python decode_test.py 60Done in 13.2945489883Done2 in 7.32861185074My general advice is to pull back less data in a single request. AaronOn 20 Oct, 2010,at 11:30 AM, Wayne  wrote:I am not sure how many bytes, but we do convert the cassandra object that is returned in 3s into a dictionary in ~1s and then again into a custom python object in about ~1.5s. Expectations are based on this timing. If we can convert what thrift returns into a completely new python object in 1s why does thrift need 3s to give it to us?
To us it is like the MySQL client we use in python. It is really C wrapped in python and adds almost zero overhead to the time it takes mysql to return the data. That is the expectation we have and the performance we are looking to get to. Disk I/O + 20%.
We are returning one big row and this is not our normal use case but a requirement for us to use Cassandra. We need to get all data for a specific value, as this is a secondary index. It is like getting all users in the state of CA. CA is the key and there is a column for every user id. We are testing with 600,000 but this will grow to 10+ million in the future.
We can not test .7 as we are only using .6.6. We are trying to evaluate Cassandra and stability is one concern so .7 is definitely not for us at this point.Thanks.On Tue, Oct 19, 2010 at 4:27 PM, Aaron Morton  wrote:
 Just wondering how many bytes you are returning to the client to get an idea of how slow it is. 
The call to fastbinary is decoding the wireformat and creating the Python objects. When you ask for 600,000 columns your are creating a lot of python objects. Each column will be a ColumnOrSuperColumn, wrapping a Column, which has probably 2 Strings. So 2.4 million python objects.
Here's  my rough test script. def go(count):    start = time.time()    buffer = [        ttypes.ColumnOrSuperColumn(column=ttypes.Column(
            "column_name_%s" % i, "row_size of something something", 0, 0))        for i in range(count)    ]    print "Done in %s" % (time.time() - start)
On my machine that takes 13 seconds for 600,000 and 0.04 for 10,000. The fastbinary module is running a lot faster because it's all in c.  It's not a great test but I think it gives an idea of what you are asking for.
I think there is an element of python been slower than other languages. But IMHO you are asking for a lot of data. Can you ask for less data? Out of interest are you able to try the avro client? It's still experimental (0.7 only) but may give you something to compare it against. 
AaronOn 20 Oct, 2010,at 07:23 AM, Wayne  wrote:
It is an entire row which is 600,000 cols. We pass a limit of 10million to make sure we get it all. Our issue is that it seems Thrift itself has more overhead/latency added to a read that Cassandra takes itself to do the read. If cfstats for the slowest node reports 2.25s to us it is not acceptable that the data comes back to the client in 5.5s. After working with Jonathon we have optimized Cassandra itself to return the quorum read in 2.7s but we still have 3s getting lost in the thrift call (fastbinary.decode_binary).

We have seen this pattern totally hold for ms reads as well for a few cols, but it is easier to look at things in seconds. If Cassandra can get the data off of the disks in 2.25s we expect to have the data in a Python object in under 3s. That is a totally realistic expectation from our experience. All latency needs to be pushed down to disk random read latency as that should always be what takes the longest. Everything else is passing through memory.

On Tue, Oct 19, 2010 at 2:06 PM, aaron morton  wrote:

Wayne,
I'm calling cassandra from Python and have not seen too many 3 second reads.

Your last email with log messages in it looks like your are

bootstrap question

2010-10-19 Thread Yang
from line 396 of StorageService.java from the 0.7.0-beta2 source, it
looks that when I boot up a completely new node,
if there is not any keyspace defined in its storage.yaml,  it would
not even participate in the ring?

in other words, let's say the cassandra instance currently has 10
nodes, and hosts data for 4 keyspaces,  now the 11th node is added to
the system,
in order to shed load from the existing 10 nodes, I have to define a
dummy keyspace for the new node to trigger the startBootstrap() code
below?


sorry for the newbie question, just starting out to dig into the code 

Thanks
Yang


// don't bootstrap if there are no tables defined.
if (DatabaseDescriptor.getNonSystemTables().size() > 0)
startBootstrap(token);
else
{
isBootstrapMode = false;
SystemTable.setBootstrapped(true);
tokenMetadata_.updateNormalToken(token,
FBUtilities.getLocalAddress());

Gossiper.instance.addLocalApplicationState(ApplicationState.STATUS,
valueFactory.normal(token));
setMode("Normal", false);
}


Re: Read Latency

2010-10-19 Thread Nicholas Knight
On Oct 20, 2010, at 6:30 AM, Wayne wrote:

> I am not sure how many bytes,

Then I don't think your performance numbers really mean anything substantial. 
Deserialization time is inevitably going to go up with the amount of data 
present, so unless you know how much data you actually have, there's no way to 
know if the time being taken is reasonable or not.

> but we do convert the cassandra object that is returned in 3s into a 
> dictionary in ~1s and then again into a custom python object in about ~1.5s. 
> Expectations are based on this timing. If we can convert what thrift returns 
> into a completely new python object in 1s why does thrift need 3s to give it 
> to us?


(de)serialization, even of binary protocols, is a good deal more complex than 
simply copying around data you already have on the heap (or maybe even in CPU 
cache, if it's small enough) in native format.

Worse, unless you're explicitly doing deep copies (which is usually the wrong 
thing to do), you're probably _not_ constructing a "completely new python 
object". You're constructing an object that has references to bits of data 
already deserialized onto the heap by Thrift. This makes the comparison even 
less meaningful.

-NK

Re: How to get all rows inserted

2010-10-19 Thread Aaron Morton
The general pattern is to use get_range_slices as you describe, also http://wiki.apache.org/cassandra/FAQ#iter_worldNote you should be used the key fields on the KeyRange not the tokens. There have been a few issues around with using the RandomPartitioner so it may be best to get on  0.6.6 if you can. The data will be out of order, but you should still be able to iterate it. Please send examples if it's not working for you. hope that helpsAaron On 20 Oct, 2010,at 12:36 PM, Robert  wrote:For this case, the order doesn't matter, I just need to page over all of the data X rows at a time.  When I use column_family.get_range from pycassa and pass in the last key as the new start key, I do not get all of the results.  I have found a few posts about this, but I did not find a recommended implementation.
http://www.mail-archive.com/user@cassandra.apache.org/msg04827.htmlhttps://issues.apache.org/jira/browse/CASSANDRA-1042
I tried this solution, but this does not return all of the results.http://www.mail-archive.com/user@cassandra.apache.org/msg05042.html
cheers,--r2On Tue, Oct 19, 2010 at 3:33 PM, Tyler Hobbs  wrote:
I don't think I understand what you're trying to do. Do you want to page over the whole columnfamily X rows at a time?  Does it matter if the rows are in order?
- TylerOn Tue, Oct 19, 2010 at 5:22 PM, Robert  wrote:

I have a similar question.  Is there a way to divide this into multiple requests?  I am using Cassandra v0.6.4, RandomPartitioner, and the pycassa library.

Can I use get_range_slices with a start_token=0, and then recalculate the token from the last value key returned until it equals it loops around the entire ring?

cheers,--RobertOn Tue, Oct 19, 2010 at 2:02 PM, Aaron Morton  wrote:



KeyRange as a count on it, the default is 100. For the ordering, double check you are using the OrderPreserving partitioner It it's still out of order send an example. 



CheersAaronOn 20 Oct, 2010,at 09:39 AM, Wicked J  wrote:



Hi,I inserted 500 rows (records) in Cassandra and I'm using the following code to retrieve all the inserted rows. However, I'm able to get only 100 rows (in a random order). I'm using Cassandra v0.6.4 with OrderPreserving Partition on a single node/instance. 




How can I get all the rows inserted? i.e. the other 400 rows.Thanks== Code ==KeyRange keyRange = new KeyRange();keyRange.start_key = start; //1keyRange.end_key = end; //500SliceRange sliceRange = new SliceRange();




sliceRange.setStart(new byte[]{});sliceRange.setFinish(new byte[]{});SlicePredicate slicePredicate = new SlicePredicate();slicePredicate.setSlice_range(sliceRange);ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);




keySlices = client.get_range_slices(KEYSPACE, columnParent, slicePredicate, keyRange, ConsistencyLevel.ONE);System.out.println("Key Slice Size="+keySlices.size());





Re: How to get all rows inserted

2010-10-19 Thread Robert
For this case, the order doesn't matter, I just need to page over all of the
data X rows at a time.  When I use column_family.get_range from pycassa and
pass in the last key as the new start key, I do not get all of the results.
 I have found a few posts about this, but I did not find a recommended
implementation.

http://www.mail-archive.com/user@cassandra.apache.org/msg04827.html

https://issues.apache.org/jira/browse/CASSANDRA-1042

I tried this solution,
but this does not return all of the results.
http://www.mail-archive.com/user@cassandra.apache.org/msg05042.html

cheers,
--r2


On Tue, Oct 19, 2010 at 3:33 PM, Tyler Hobbs  wrote:

> I don't think I understand what you're trying to do. Do you want to page
> over the whole column
> family X rows at a time?  Does it matter if the rows are in order?
>
> - Tyler
>
>
> On Tue, Oct 19, 2010 at 5:22 PM, Robert wrote:
>
>> I have a similar question.  Is there a way to divide this into multiple
>> requests?  I am using Cassandra v0.6.4, RandomPartitioner, and the pycassa
>> library.
>>
>> Can I use get_range_slices with a start_token=0, and then recalculate the
>> token from the last value key returned until it equals it loops around the
>> entire ring?
>>
>> cheers,
>> --Robert
>>
>> On Tue, Oct 19, 2010 at 2:02 PM, Aaron Morton wrote:
>>
>>> KeyRange as a count on it, the default is 100.
>>>
>>> For the ordering, double check you are using the OrderPreserving
>>> partitioner It it's still out of order send an example.
>>>
>>> Cheers
>>> Aaron
>>>
>>> On 20 Oct, 2010,at 09:39 AM, Wicked J  wrote:
>>>
>>> Hi,
>>> I inserted 500 rows (records) in Cassandra and I'm using the following
>>> code to retrieve all the inserted rows. However, I'm able to get only 100
>>> rows (in a random order). I'm using Cassandra v0.6.4 with OrderPreserving
>>> Partition on a single node/instance.
>>> How can I get all the rows inserted? i.e. the other 400 rows.
>>>
>>> Thanks
>>>
>>> == Code ==
>>>
>>> KeyRange keyRange = new KeyRange();
>>> keyRange.start_key = start; //1
>>> keyRange.end_key = end; //500
>>>
>>> SliceRange sliceRange = new SliceRange();
>>> sliceRange.setStart(new byte[]{});
>>> sliceRange.setFinish(new byte[]{});
>>>
>>> SlicePredicate slicePredicate = new SlicePredicate();
>>> slicePredicate.setSlice_range(sliceRange);
>>>
>>> ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);
>>> keySlices = client.get_range_slices(KEYSPACE, columnParent,
>>> slicePredicate, keyRange, ConsistencyLevel.ONE);
>>> System.out.println("Key Slice Size="+keySlices.size());
>>>
>>>
>>>
>>>
>>
>


Re: How to get all rows inserted

2010-10-19 Thread Tyler Hobbs
I don't think I understand what you're trying to do. Do you want to page
over the whole column
family X rows at a time?  Does it matter if the rows are in order?

- Tyler

On Tue, Oct 19, 2010 at 5:22 PM, Robert  wrote:

> I have a similar question.  Is there a way to divide this into multiple
> requests?  I am using Cassandra v0.6.4, RandomPartitioner, and the pycassa
> library.
>
> Can I use get_range_slices with a start_token=0, and then recalculate the
> token from the last value key returned until it equals it loops around the
> entire ring?
>
> cheers,
> --Robert
>
> On Tue, Oct 19, 2010 at 2:02 PM, Aaron Morton wrote:
>
>> KeyRange as a count on it, the default is 100.
>>
>> For the ordering, double check you are using the OrderPreserving
>> partitioner It it's still out of order send an example.
>>
>> Cheers
>> Aaron
>>
>> On 20 Oct, 2010,at 09:39 AM, Wicked J  wrote:
>>
>> Hi,
>> I inserted 500 rows (records) in Cassandra and I'm using the following
>> code to retrieve all the inserted rows. However, I'm able to get only 100
>> rows (in a random order). I'm using Cassandra v0.6.4 with OrderPreserving
>> Partition on a single node/instance.
>> How can I get all the rows inserted? i.e. the other 400 rows.
>>
>> Thanks
>>
>> == Code ==
>>
>> KeyRange keyRange = new KeyRange();
>> keyRange.start_key = start; //1
>> keyRange.end_key = end; //500
>>
>> SliceRange sliceRange = new SliceRange();
>> sliceRange.setStart(new byte[]{});
>> sliceRange.setFinish(new byte[]{});
>>
>> SlicePredicate slicePredicate = new SlicePredicate();
>> slicePredicate.setSlice_range(sliceRange);
>>
>> ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);
>> keySlices = client.get_range_slices(KEYSPACE, columnParent,
>> slicePredicate, keyRange, ConsistencyLevel.ONE);
>> System.out.println("Key Slice Size="+keySlices.size());
>>
>>
>>
>>
>


Re: Read Latency

2010-10-19 Thread Wayne
I am not sure how many bytes, but we do convert the cassandra object that is
returned in 3s into a dictionary in ~1s and then again into a custom python
object in about ~1.5s. Expectations are based on this timing. If we can
convert what thrift returns into a completely new python object in 1s why
does thrift need 3s to give it to us?

To us it is like the MySQL client we use in python. It is really C wrapped
in python and adds almost zero overhead to the time it takes mysql to return
the data. That is the expectation we have and the performance we are looking
to get to. Disk I/O + 20%.

We are returning one big row and this is not our normal use case but a
requirement for us to use Cassandra. We need to get all data for a specific
value, as this is a secondary index. It is like getting all users in the
state of CA. CA is the key and there is a column for every user id. We are
testing with 600,000 but this will grow to 10+ million in the future.

We can not test .7 as we are only using .6.6. We are trying to evaluate
Cassandra and stability is one concern so .7 is definitely not for us at
this point.

Thanks.


On Tue, Oct 19, 2010 at 4:27 PM, Aaron Morton wrote:

>
>  Just wondering how many bytes you are returning to the client to get an
> idea of how slow it is.
>
> The call to fastbinary is decoding the wireformat and creating the Python
> objects. When you ask for 600,000 columns your are creating a lot of python
> objects. Each column will be a ColumnOrSuperColumn, wrapping a Column, which
> has probably 2 Strings. So 2.4 million python objects.
>
> Here's  my rough test script.
>
> def go(count):
> start = time.time()
> buffer = [
> ttypes.ColumnOrSuperColumn(column=ttypes.Column(
> "column_name_%s" % i, "row_size of something something", 0, 0))
> for i in range(count)
> ]
> print "Done in %s" % (time.time() - start)
>
> On my machine that takes 13 seconds for 600,000 and 0.04 for 10,000. The
> fastbinary module is running a lot faster because it's all in c.  It's not a
> great test but I think it gives an idea of what you are asking for.
>
> I think there is an element of python been slower than other languages. But
> IMHO you are asking for a lot of data. Can you ask for less data?
>
> Out of interest are you able to try the avro client? It's still
> experimental (0.7 only) but may give you something to compare it against.
>
> Aaron
> On 20 Oct, 2010,at 07:23 AM, Wayne  wrote:
>
> It is an entire row which is 600,000 cols. We pass a limit of 10million to
> make sure we get it all. Our issue is that it seems Thrift itself has more
> overhead/latency added to a read that Cassandra takes itself to do the read.
> If cfstats for the slowest node reports 2.25s to us it is not acceptable
> that the data comes back to the client in 5.5s. After working with Jonathon
> we have optimized Cassandra itself to return the quorum read in 2.7s but we
> still have 3s getting lost in the thrift call (fastbinary.decode_binary).
>
> We have seen this pattern totally hold for ms reads as well for a few cols,
> but it is easier to look at things in seconds. If Cassandra can get the data
> off of the disks in 2.25s we expect to have the data in a Python object in
> under 3s. That is a totally realistic expectation from our experience. All
> latency needs to be pushed down to disk random read latency as that should
> always be what takes the longest. Everything else is passing through memory.
>
>
> On Tue, Oct 19, 2010 at 2:06 PM, aaron morton wrote:
>
>> Wayne,
>> I'm calling cassandra from Python and have not seen too many 3 second
>> reads.
>>
>> Your last email with log messages in it looks like your are asking for
>> 10,000,000 columns. How much data is this request actually transferring to
>> the client? The column names suggest only a few.
>>
>> DEBUG [pool-1-thread-64] 2010-10-18 19:25:28,867 StorageProxy.java (line
>> 471) strongread reading data for SliceFromReadCommand(table='table',
>> key='key1', column_parent='QueryPath(columnFamilyName='fact',
>> superColumnName='null', columnName='null')', start='503a', finish='503a7c',
>> reversed=false, count=1000) from 698@/x.x.x.6
>>
>> Aaron
>>
>>
>> On 20 Oct 2010, at 06:18, Jonathan Ellis wrote:
>>
>> > I would expect C++ or Java to be substantially faster than Python.
>> > However, I note that Hector (and I believe Pelops) don't yet use the
>> > newest, fastest Thrift library.
>> >
>> > On Tue, Oct 19, 2010 at 8:21 AM, Wayne  wrote:
>> >> The changes seems to do the trick. We are down to about 1/2 of the
>> original
>> >> quorum read performance. I did not see any more errors.
>> >>
>> >> More than 3 seconds on the client side is still not acceptable to us.
>> We
>> >> need the data in Python, but would we be better off going through Java
>> or
>> >> something else to increase performance? All three seconds are taken up
>> in
>> >> Thrift itself (fastbinary.decode_binary(self, iprot.trans,
>> (self.__class__,
>> >> se

Re: How to get all rows inserted

2010-10-19 Thread Robert
I have a similar question.  Is there a way to divide this into multiple
requests?  I am using Cassandra v0.6.4, RandomPartitioner, and the pycassa
library.

Can I use get_range_slices with a start_token=0, and then recalculate the
token from the last value key returned until it equals it loops around the
entire ring?

cheers,
--Robert

On Tue, Oct 19, 2010 at 2:02 PM, Aaron Morton wrote:

> KeyRange as a count on it, the default is 100.
>
> For the ordering, double check you are using the OrderPreserving
> partitioner It it's still out of order send an example.
>
> Cheers
> Aaron
>
> On 20 Oct, 2010,at 09:39 AM, Wicked J  wrote:
>
> Hi,
> I inserted 500 rows (records) in Cassandra and I'm using the following code
> to retrieve all the inserted rows. However, I'm able to get only 100 rows
> (in a random order). I'm using Cassandra v0.6.4 with OrderPreserving
> Partition on a single node/instance.
> How can I get all the rows inserted? i.e. the other 400 rows.
>
> Thanks
>
> == Code ==
>
> KeyRange keyRange = new KeyRange();
> keyRange.start_key = start; //1
> keyRange.end_key = end; //500
>
> SliceRange sliceRange = new SliceRange();
> sliceRange.setStart(new byte[]{});
> sliceRange.setFinish(new byte[]{});
>
> SlicePredicate slicePredicate = new SlicePredicate();
> slicePredicate.setSlice_range(sliceRange);
>
> ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);
> keySlices = client.get_range_slices(KEYSPACE, columnParent, slicePredicate,
> keyRange, ConsistencyLevel.ONE);
> System.out.println("Key Slice Size="+keySlices.size());
>
>
>
>


Re: How to get all rows inserted

2010-10-19 Thread Aaron Morton
KeyRange as a count on it, the default is 100. For the ordering, double check you are using the OrderPreserving partitioner It it's still out of order send an example. CheersAaronOn 20 Oct, 2010,at 09:39 AM, Wicked J  wrote:Hi,I inserted 500 rows (records) in Cassandra and I'm using the following code to retrieve all the inserted rows. However, I'm able to get only 100 rows (in a random order). I'm using Cassandra v0.6.4 with OrderPreserving Partition on a single node/instance. 
How can I get all the rows inserted? i.e. the other 400 rows.Thanks== Code ==KeyRange keyRange = new KeyRange();keyRange.start_key = start; //1keyRange.end_key = end; //500SliceRange sliceRange = new SliceRange();
sliceRange.setStart(new byte[]{});sliceRange.setFinish(new byte[]{});SlicePredicate slicePredicate = new SlicePredicate();slicePredicate.setSlice_range(sliceRange);ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);
keySlices = client.get_range_slices(KEYSPACE, columnParent, slicePredicate, keyRange, ConsistencyLevel.ONE);System.out.println("Key Slice Size="+keySlices.size());


How to get all rows inserted

2010-10-19 Thread Wicked J
Hi,
I inserted 500 rows (records) in Cassandra and I'm using the following code
to retrieve all the inserted rows. However, I'm able to get only 100 rows
(in a random order). I'm using Cassandra v0.6.4 with OrderPreserving
Partition on a single node/instance.
How can I get all the rows inserted? i.e. the other 400 rows.

Thanks

== Code ==

KeyRange keyRange = new KeyRange();
keyRange.start_key = start; //1
keyRange.end_key = end; //500

SliceRange sliceRange = new SliceRange();
sliceRange.setStart(new byte[]{});
sliceRange.setFinish(new byte[]{});

SlicePredicate slicePredicate = new SlicePredicate();
slicePredicate.setSlice_range(sliceRange);

ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY);
keySlices = client.get_range_slices(KEYSPACE, columnParent, slicePredicate,
keyRange, ConsistencyLevel.ONE);
System.out.println("Key Slice Size="+keySlices.size());


Re: Throttling ColumnFamilyRecordReader

2010-10-19 Thread Jonathan Ellis
(Moving to u...@.)

Isn't reducing the number of map tasks the easiest way to tune this?

Also: in 0.7 you can use NetworkTopologyStrategy to designate a group
of nodes as your hadoop "datacenter" so the workloads won't overlap.

On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores  wrote:
> Does it make sense to add some kind of throttle capability on the 
> ColumnFamilyRecordReader for Hadoop?
>
> If I have 60 or so Map tasks running at the same time when the cluster is 
> already heavily loaded with OLTP operations, I can get some decreased on-line 
> performance
> that may not be acceptable.  (I'm loading an 8 node cluster with 2000 TPS.)  
> By default my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) 
> has 8 Map tasks per node making the get_range_slices call, based on what the 
> ColumnFamilyInputFormat has calculated from my token ranges.
> I can increase the inputSplitSize  (ConfigHelper.setInputSplitSIze()) so that 
> there
> is only one Map task per node, and this helps quite a bit.
>
> But is it reasonable to provide a configurable sleep to cause a wait in 
> between smaller size range queries?  That would stretch out the Map time
> and let the OLTP processing be less affected.
>
>
> --Michael
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Read Latency

2010-10-19 Thread Aaron Morton
 Just wondering how many bytes you are returning to the client to get an idea of how slow it is. The call to fastbinary is decoding the wireformat and creating the Python objects. When you ask for 600,000 columns your are creating a lot of python objects. Each column will be a ColumnOrSuperColumn, wrapping a Column, which has probably 2 Strings. So 2.4 million python objects.Here's  my rough test script. def go(count):    start = time.time()    buffer = [        ttypes.ColumnOrSuperColumn(column=ttypes.Column(            "column_name_%s" % i, "row_size of something something", 0, 0))        for i in range(count)    ]    print "Done in %s" % (time.time() - start)On my machine that takes 13 seconds for 600,000 and 0.04 for 10,000. The fastbinary module is running a lot faster because it's all in c.  It's not a great test but I think it gives an idea of what you are asking for.I think there is an element of python been slower than other languages. But IMHO you are asking for a lot of data. Can you ask for less data? Out of interest are you able to try the avro client? It's still experimental (0.7 only) but may give you something to compare it against. AaronOn 20 Oct, 2010,at 07:23 AM, Wayne  wrote:It is an entire row which is 600,000 cols. We pass a limit of 10million to make sure we get it all. Our issue is that it seems Thrift itself has more overhead/latency added to a read that Cassandra takes itself to do the read. If cfstats for the slowest node reports 2.25s to us it is not acceptable that the data comes back to the client in 5.5s. After working with Jonathon we have optimized Cassandra itself to return the quorum read in 2.7s but we still have 3s getting lost in the thrift call (fastbinary.decode_binary).
We have seen this pattern totally hold for ms reads as well for a few cols, but it is easier to look at things in seconds. If Cassandra can get the data off of the disks in 2.25s we expect to have the data in a Python object in under 3s. That is a totally realistic expectation from our experience. All latency needs to be pushed down to disk random read latency as that should always be what takes the longest. Everything else is passing through memory.
On Tue, Oct 19, 2010 at 2:06 PM, aaron morton  wrote:
Wayne,
I'm calling cassandra from Python and have not seen too many 3 second reads.

Your last email with log messages in it looks like your are asking for 10,000,000 columns. How much data is this request actually transferring to the client? The column names suggest only a few.

DEBUG [pool-1-thread-64] 2010-10-18 19:25:28,867 StorageProxy.java (line 471) strongread reading data for SliceFromReadCommand(table='table', key='key1', column_parent='QueryPath(columnFamilyName='fact', superColumnName='null', columnName='null')', start='503a', finish='503a7c', reversed=false, count=1000) from 698@/x.x.x.6


Aaron

On 20 Oct 2010, at 06:18, Jonathan Ellis wrote:

> I would expect C++ or Java to be substantially faster than Python.
> However, I note that Hector (and I believe Pelops) don't yet use the
> newest, fastest Thrift library.
>
> On Tue, Oct 19, 2010 at 8:21 AM, Wayne  wrote:
>> The changes seems to do the trick. We are down to about 1/2 of the original
>> quorum read performance. I did not see any more errors.
>>
>> More than 3 seconds on the client side is still not acceptable to us. We
>> need the data in Python, but would we be better off going through Java or
>> something else to increase performance? All three seconds are taken up in
>> Thrift itself (fastbinary.decode_binary(self, iprot.trans, (self.__class__,
>> self.thrift_spec))) so I am not sure what other options we have.
>>
>> Thanks for your help.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptanocom




Re: Read Latency

2010-10-19 Thread Wayne
It is an entire row which is 600,000 cols. We pass a limit of 10million to
make sure we get it all. Our issue is that it seems Thrift itself has more
overhead/latency added to a read that Cassandra takes itself to do the read.
If cfstats for the slowest node reports 2.25s to us it is not acceptable
that the data comes back to the client in 5.5s. After working with Jonathon
we have optimized Cassandra itself to return the quorum read in 2.7s but we
still have 3s getting lost in the thrift call (fastbinary.decode_binary).

We have seen this pattern totally hold for ms reads as well for a few cols,
but it is easier to look at things in seconds. If Cassandra can get the data
off of the disks in 2.25s we expect to have the data in a Python object in
under 3s. That is a totally realistic expectation from our experience. All
latency needs to be pushed down to disk random read latency as that should
always be what takes the longest. Everything else is passing through memory.


On Tue, Oct 19, 2010 at 2:06 PM, aaron morton wrote:

> Wayne,
> I'm calling cassandra from Python and have not seen too many 3 second
> reads.
>
> Your last email with log messages in it looks like your are asking for
> 10,000,000 columns. How much data is this request actually transferring to
> the client? The column names suggest only a few.
>
> DEBUG [pool-1-thread-64] 2010-10-18 19:25:28,867 StorageProxy.java (line
> 471) strongread reading data for SliceFromReadCommand(table='table',
> key='key1', column_parent='QueryPath(columnFamilyName='fact',
> superColumnName='null', columnName='null')', start='503a', finish='503a7c',
> reversed=false, count=1000) from 698@/x.x.x.6
>
> Aaron
>
> On 20 Oct 2010, at 06:18, Jonathan Ellis wrote:
>
> > I would expect C++ or Java to be substantially faster than Python.
> > However, I note that Hector (and I believe Pelops) don't yet use the
> > newest, fastest Thrift library.
> >
> > On Tue, Oct 19, 2010 at 8:21 AM, Wayne  wrote:
> >> The changes seems to do the trick. We are down to about 1/2 of the
> original
> >> quorum read performance. I did not see any more errors.
> >>
> >> More than 3 seconds on the client side is still not acceptable to us. We
> >> need the data in Python, but would we be better off going through Java
> or
> >> something else to increase performance? All three seconds are taken up
> in
> >> Thrift itself (fastbinary.decode_binary(self, iprot.trans,
> (self.__class__,
> >> self.thrift_spec))) so I am not sure what other options we have.
> >>
> >> Thanks for your help.
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
>
>


Re: Read Latency

2010-10-19 Thread Wayne
Our problem is not that Python is slow, our problem is that getting data
from the Cassandra server is slow (while Cassandra itself is fast). Python
can handle the result data a lot faster that whatever is it passing through
now...

I guess to ask a specific question what right now is the fastest mechanism
in terms of latency to get data from Cassandra to a client application? I
assume it is Java? We would not use any higher level library and prefer to
go directly against thrift (whatever is the fastest method).  We can easily
write our own C++ layer but if C++ still has to go through Thrift and thrift
is our problem we have solved nothing. To us this appears much more as a
maturity/optimization problem in thrift than anything to do with language
benefits.

Given our entire wait is on a call to Thrift below I tend to think nothing
we do (in any language) will help except making optimizations to Thrift or
Avro?

Thanks for the help!

On Tue, Oct 19, 2010 at 1:18 PM, Jonathan Ellis  wrote:

> I would expect C++ or Java to be substantially faster than Python.
> However, I note that Hector (and I believe Pelops) don't yet use the
> newest, fastest Thrift library.
>
> On Tue, Oct 19, 2010 at 8:21 AM, Wayne  wrote:
> > The changes seems to do the trick. We are down to about 1/2 of the
> original
> > quorum read performance. I did not see any more errors.
> >
> > More than 3 seconds on the client side is still not acceptable to us. We
> > need the data in Python, but would we be better off going through Java or
> > something else to increase performance? All three seconds are taken up in
> > Thrift itself (fastbinary.decode_binary(self, iprot.trans,
> (self.__class__,
> > self.thrift_spec))) so I am not sure what other options we have.
> >
> > Thanks for your help.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Read Latency

2010-10-19 Thread aaron morton
Wayne, 
I'm calling cassandra from Python and have not seen too many 3 second reads. 

Your last email with log messages in it looks like your are asking for 
10,000,000 columns. How much data is this request actually transferring to the 
client? The column names suggest only a few. 

DEBUG [pool-1-thread-64] 2010-10-18 19:25:28,867 StorageProxy.java (line 471) 
strongread reading data for SliceFromReadCommand(table='table', key='key1', 
column_parent='QueryPath(columnFamilyName='fact', superColumnName='null', 
columnName='null')', start='503a', finish='503a7c', reversed=false, 
count=1000) from 698@/x.x.x.6

Aaron

On 20 Oct 2010, at 06:18, Jonathan Ellis wrote:

> I would expect C++ or Java to be substantially faster than Python.
> However, I note that Hector (and I believe Pelops) don't yet use the
> newest, fastest Thrift library.
> 
> On Tue, Oct 19, 2010 at 8:21 AM, Wayne  wrote:
>> The changes seems to do the trick. We are down to about 1/2 of the original
>> quorum read performance. I did not see any more errors.
>> 
>> More than 3 seconds on the client side is still not acceptable to us. We
>> need the data in Python, but would we be better off going through Java or
>> something else to increase performance? All three seconds are taken up in
>> Thrift itself (fastbinary.decode_binary(self, iprot.trans, (self.__class__,
>> self.thrift_spec))) so I am not sure what other options we have.
>> 
>> Thanks for your help.
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com



Re: Dumping Cassandra into Hadoop

2010-10-19 Thread aaron morton
Depends on what you mean by dumping into Hadoop. 

If you want to read them from a Hadoop Job then you can use either native 
Hadoop or Pig. See the contrib/word_count and contrib/pig examples. 

If you want to copy the data into a Hadoop File System install then I guess 
almost anything that can read from Cassandra and create a file should be OK. 
You can then copy it onto the HFS and read from there. 

Hope that helps.
Aaron


On 20 Oct 2010, at 04:01, Mark wrote:

> As the subject implies I am trying to dump Cassandra rows into Hadoop. What 
> is the easiest way for me to accomplish this? Thanks.
> 
> Should I be looking into pig for something like this?



Re: Read Latency

2010-10-19 Thread Jonathan Ellis
I would expect C++ or Java to be substantially faster than Python.
However, I note that Hector (and I believe Pelops) don't yet use the
newest, fastest Thrift library.

On Tue, Oct 19, 2010 at 8:21 AM, Wayne  wrote:
> The changes seems to do the trick. We are down to about 1/2 of the original
> quorum read performance. I did not see any more errors.
>
> More than 3 seconds on the client side is still not acceptable to us. We
> need the data in Python, but would we be better off going through Java or
> something else to increase performance? All three seconds are taken up in
> Thrift itself (fastbinary.decode_binary(self, iprot.trans, (self.__class__,
> self.thrift_spec))) so I am not sure what other options we have.
>
> Thanks for your help.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Jeremy Hanna
That code has never existed in the public.  It was taken out before it was 
open-sourced.

On Oct 19, 2010, at 11:45 AM, Yang wrote:

> Thanks guys.
> 
> but I feel it would probably be better to refactor out the hooks and
> make components like zookeeper pluggable , so users could use either
> zookeeper or the current  config-file based seeds discovery
> 
> Yang
> 
> On Tue, Oct 19, 2010 at 9:02 AM, Jeremy Hanna
>  wrote:
>> Further, when they open-sourced cassandra, they removed certain things that 
>> are specific to their environment at facebook, including zookeeper 
>> integration.  In the paper it has some specific purposes, like finding seeds 
>> iirc.  Apache Cassandra has no dependency on zookeeper.  I think that's a 
>> good thing as managing cassandra is easier without those kinds of 
>> dependencies, e.g. for shops that don't currently use any hadoop components.
>> 
>> On Oct 19, 2010, at 3:11 AM, Norman Maurer wrote:
>> 
>>> No Zookeeper is not used in cassandra. You can use Zookeeper as some
>>> kind of add-on to do locking etc.
>>> 
>>> Bye,
>>> Norman
>>> 
>>> 
>>> 2010/10/19 Yang :
 I read from the Facebook cassandra paper that zookeeper "is used
 ." for certain things ( membership and Rack-aware placement)
 
 but I pulled 0.7.0-beta2 source and couldn't grep out anything with
 "Zk" or "Zoo",  nor any files with "Zk/Zoo" in the names
 
 
 is Zookeeper really used? docs/blog posts from online search kind of
 give conflicting answers
 
 
 Thanks
 Yang
 
>> 
>> 



Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Yang
Thanks guys.

but I feel it would probably be better to refactor out the hooks and
make components like zookeeper pluggable , so users could use either
zookeeper or the current  config-file based seeds discovery

Yang

On Tue, Oct 19, 2010 at 9:02 AM, Jeremy Hanna
 wrote:
> Further, when they open-sourced cassandra, they removed certain things that 
> are specific to their environment at facebook, including zookeeper 
> integration.  In the paper it has some specific purposes, like finding seeds 
> iirc.  Apache Cassandra has no dependency on zookeeper.  I think that's a 
> good thing as managing cassandra is easier without those kinds of 
> dependencies, e.g. for shops that don't currently use any hadoop components.
>
> On Oct 19, 2010, at 3:11 AM, Norman Maurer wrote:
>
>> No Zookeeper is not used in cassandra. You can use Zookeeper as some
>> kind of add-on to do locking etc.
>>
>> Bye,
>> Norman
>>
>>
>> 2010/10/19 Yang :
>>> I read from the Facebook cassandra paper that zookeeper "is used
>>> ." for certain things ( membership and Rack-aware placement)
>>>
>>> but I pulled 0.7.0-beta2 source and couldn't grep out anything with
>>> "Zk" or "Zoo",  nor any files with "Zk/Zoo" in the names
>>>
>>>
>>> is Zookeeper really used? docs/blog posts from online search kind of
>>> give conflicting answers
>>>
>>>
>>> Thanks
>>> Yang
>>>
>
>


Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Jeremy Hanna
Further, when they open-sourced cassandra, they removed certain things that are 
specific to their environment at facebook, including zookeeper integration.  In 
the paper it has some specific purposes, like finding seeds iirc.  Apache 
Cassandra has no dependency on zookeeper.  I think that's a good thing as 
managing cassandra is easier without those kinds of dependencies, e.g. for 
shops that don't currently use any hadoop components.

On Oct 19, 2010, at 3:11 AM, Norman Maurer wrote:

> No Zookeeper is not used in cassandra. You can use Zookeeper as some
> kind of add-on to do locking etc.
> 
> Bye,
> Norman
> 
> 
> 2010/10/19 Yang :
>> I read from the Facebook cassandra paper that zookeeper "is used
>> ." for certain things ( membership and Rack-aware placement)
>> 
>> but I pulled 0.7.0-beta2 source and couldn't grep out anything with
>> "Zk" or "Zoo",  nor any files with "Zk/Zoo" in the names
>> 
>> 
>> is Zookeeper really used? docs/blog posts from online search kind of
>> give conflicting answers
>> 
>> 
>> Thanks
>> Yang
>> 



Re: Cassandra security model? ( or, authentication docs ?)

2010-10-19 Thread Jeremy Hanna
just as an fyi, I created something in the wiki yesterday - it's just a start 
though - http://wiki.apache.org/cassandra/ExtensibleAuth
there's also a FAQ entry on it now - http://wiki.apache.org/cassandra/FAQ#auth
just for going forward - on the wiki itself, just trying to help there.

On Oct 19, 2010, at 3:06 AM, Yang wrote:

> Thanks a lot
> 
> On Mon, Oct 18, 2010 at 11:44 AM, Eric Evans  wrote:
>> On Sun, 2010-10-17 at 21:26 -0700, Yang wrote:
>>> I searched around, it seems that this is not clearly documented yet;
>>> the closest I found is:
>>> http://www.riptano.com/docs/0.6.5/install/auth-config
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Authentication-td5285013.html#a5285013
>>> 
>>> I did start cassandra with the args mentioned above:
>>> 
>>> bin/cassandra -Dpasswd.properties=mypasswd.properties
>>> -Daccess.properties=myaccess.properties -f
>> 
>> Try
>> http://www.riptano.com/docs/0.6.5/install/storage-config#Authenticator
>> 
>> 
>> --
>> Eric Evans
>> eev...@rackspace.com
>> 
>> 



Re: Hadoop Word Count Super Column Example?

2010-10-19 Thread Jeremy Hanna
It's relatively straightforward, the current mapper gets a map of column names 
to IColumns.  The SuperColumn implements the IColumn interface.  So you would 
probably need both the super column name and the subcolumn name to get at it, 
but you just need to cast the IColumn to a super column and handle it from 
there.

On Oct 19, 2010, at 10:31 AM, Frank LoVecchio wrote:

> I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and 
> got the WordCount example to work using the standard configuration.  I have 
> been inserting data into a Super Column (Sensor) with TimeUUID as the compare 
> type, it looks like this:
> 
> get Sensor['DeviceID:Sensor']
> => (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
>  (column=sub_sensor1, value=39.742538, timestamp=1287182112633000) 
>  (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000) 
>  (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000)) 
> 
> Is there a Word Count example for super columns?  I am trying to count the 
> number of occurrences of "DEADBEEFFEED", much like "word1" in the column 
> example.  
> 
> Thanks,
> 
> Frank LoVecchio
> Software Engineer, Isidorey LLC
> isidorey.com
> 
> franklovecchio.com
> rodsandricers.com



Hadoop Word Count Super Column Example?

2010-10-19 Thread Frank LoVecchio
I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and
got the WordCount example to work using the standard configuration.  I have
been inserting data into a Super Column (Sensor) with TimeUUID as the
compare type, it looks like this:

get Sensor['DeviceID:Sensor']
=> (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
 (column=sub_sensor1, value=39.742538, timestamp=1287182112633000)
 (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000)
 (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000))

Is there a Word Count example for super columns?  I am trying to count the
number of occurrences of "DEADBEEFFEED", much like "word1" in the column
example.

Thanks,

Frank LoVecchio
Software Engineer, Isidorey LLC
isidorey.com

franklovecchio.com
rodsandricers.com


Re: Cassandra/Pelops error processing get_slice

2010-10-19 Thread Frank LoVecchio
Aaron,

It seems that we had a beta-1 node in our cluster of beta-2'.  Haven't had
the problem since.

Thanks for the help,

Frank

On Sat, Oct 16, 2010 at 1:50 PM, aaron morton wrote:

> Frank,
>
> Things are a bit clearer now. Think I had the wrong idea to start with.
>
> The server side error means this cassandra node does not know about the
> column family it was asked to read. I guess either the schema are out of
> sync on the nodes or there is a bug. How did you add the Keyspace?
>
> Check the keyspace definition on each node using either JConsole, nodetool
> or cassandra-cli to see if they match. There is a function
> called describe_schema_versions() on the 0.7 API, if your client supports it
> will tell you which schema versions are active in your cluster. Am guessing
> you have more than one active schema.
>
> You should probably get a better error message. Can you raise a bug for
> that please.
>
> Cheers
> Aaron
> On 16 Oct 2010, at 06:17, Frank LoVecchio wrote:
>
> Aaron,
>
> I updated the cassandra files and but still receive the same error (on
> client side) with a different line number 551:
>
> org.apache.thrift.TApplicationException: Internal error processing
> get_slice
> at
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>  at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:551)
> at
> org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:531)
>  at org.scale7.cassandra.pelops.Selector$6.execute(Selector.java:538)
> at org.scale7.cassandra.pelops.Selector$6.execute(Selector.java:535)
>  at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:45)
> at
> org.scale7.cassandra.pelops.Selector.getSuperColumnsFromRow(Selector.java:545)
>  at
> org.scale7.cassandra.pelops.Selector.getSuperColumnsFromRow(Selector.java:522)
> at
> com.isidorey.cassandra.dao.CassandraDAO.getSuperColumnsByKey(CassandraDAO.java:36)
>  at
> com.isidorey.cassandra.dao.CassandraDAO.getSuperColumnMap(CassandraDAO.java:82)
>
> On the server side, this is what we're seeing in Cassandra's log file:
>
> ERROR [pool-1-thread-2486] 2010-10-15 17:15:39,740 Cassandra.java (line
> 2876) Internal error processing get_slice
> java.lang.RuntimeException:
> org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
> cfId=1052
>  at
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:133)
> at
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:222)
>  at
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:300)
> at
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:261)
>  at
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.process(Cassandra.java:2868)
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2724)
>  at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1052
> at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:113)
>  at org.apache.cassandra.db.RowSerializer.deserialize(Row.java:76)
> at
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:114)
>  at
> org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:90)
> at
> org.apache.cassandra.service.StorageProxy.weakRead(StorageProxy.java:289)
>  at
> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:220)
> at
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:120)
>
>
> On Thu, Oct 14, 2010 at 6:29 PM, Aaron Morton wrote:
>
>> Am guessing but it looks like cassandra returned an error and the client
>> then had trouble reading the error.
>>
>> However if I look at the Beta 2 java thrift interface in Cassandra.java,
>> line 544 is not in recv_get_slice. May be nothing.
>>
>> Perhaps check the server for an error and double check your client is
>> coded for beta 2.
>>
>> Hope that helps.
>>
>> Aaron
>>
>>
>> On 15 Oct, 2010,at 12:32 PM, Frank LoVecchio  wrote:
>>
>>  10:10:21,787 ERROR ~ Error getting Sensor
>> org.apache.thrift.TApplicationException: Internal error processing
>> get_slice
>> at org.apache.thrift.TApplicationException.read(
>> TApplicationException.java:108)
>> at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(
>> Cassandra.java:544)
>> at org.apache.cassandra.thrift.Cassandra$Client.get_slice(
>> Cassandra.java:524)
>> at org.scale7.cassandra.pelops.Selector$6.execute(Selector.java:538)
>> at org.scale7.cassandra.pelops.Selector$6.execute(Selector.java:535)
>> at org.scale7.cassandra.pelops

Dumping Cassandra into Hadoop

2010-10-19 Thread Mark
 As the subject implies I am trying to dump Cassandra rows into Hadoop. 
What is the easiest way for me to accomplish this? Thanks.


Should I be looking into pig for something like this?


Re: Read Latency

2010-10-19 Thread Wayne
The changes seems to do the trick. We are down to about 1/2 of the original
quorum read performance. I did not see any more errors.

More than 3 seconds on the client side is still not acceptable to us. We
need the data in Python, but would we be better off going through Java or
something else to increase performance? All three seconds are taken up in
Thrift itself (fastbinary.decode_binary(self, iprot.trans, (self.__class__,
self.thrift_spec))) so I am not sure what other options we have.

Thanks for your help.


Re: Thift version

2010-10-19 Thread Brayton Thompson
Go into the lib dir in Cassandra and look at the thrift jar. The name has in it 
the specific revision you need to use. Use svn to pull it down. 

Sent from my iPhone

On Oct 18, 2010, at 10:50 PM, JKnight JKnight  wrote:

> Dear all, 
> 
> Which Thrift version does Cassandra 0.66 using?
> Thank a lot for support.
> 
> -- 
> Best regards,
> JKnight


RE: Preventing an update of a CF row

2010-10-19 Thread Viktor Jevdokimov
Reverse timestamp.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@yakaz.com] 
Sent: Tuesday, October 19, 2010 10:44 AM
To: user@cassandra.apache.org
Subject: Re: Preventing an update of a CF row

> Always specify some constant value for timestamp. Only 1st insertion with that
> timestamp will succeed. Others will be ignored, because will be considered
> duplicates by cassandra.

Well, that's not entirely true. When cassandra 'resolves' two columns
having the
same timestamp, it will compare the values to decide which one to keep
(and it'll
keep the one column whose value is greater for bytes comparison).
Concretely, if you insert Column('foo', 'b', 0) and then Column('foo',
'bar', 0), then
you'll end up with the second column, even though the timestamps are the same
because 'bar' > 'b'.
So constant timestamps doesn't work for preventing updates of a given column.

--
Sylvain




Re: Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Norman Maurer
No Zookeeper is not used in cassandra. You can use Zookeeper as some
kind of add-on to do locking etc.

Bye,
Norman


2010/10/19 Yang :
> I read from the Facebook cassandra paper that zookeeper "is used
> ." for certain things ( membership and Rack-aware placement)
>
> but I pulled 0.7.0-beta2 source and couldn't grep out anything with
> "Zk" or "Zoo",  nor any files with "Zk/Zoo" in the names
>
>
> is Zookeeper really used? docs/blog posts from online search kind of
> give conflicting answers
>
>
> Thanks
> Yang
>


Cassandra + Zookeeper, what is the current state?

2010-10-19 Thread Yang
I read from the Facebook cassandra paper that zookeeper "is used
." for certain things ( membership and Rack-aware placement)

but I pulled 0.7.0-beta2 source and couldn't grep out anything with
"Zk" or "Zoo",  nor any files with "Zk/Zoo" in the names


is Zookeeper really used? docs/blog posts from online search kind of
give conflicting answers


Thanks
Yang


Re: Cassandra security model? ( or, authentication docs ?)

2010-10-19 Thread Yang
Thanks a lot

On Mon, Oct 18, 2010 at 11:44 AM, Eric Evans  wrote:
> On Sun, 2010-10-17 at 21:26 -0700, Yang wrote:
>> I searched around, it seems that this is not clearly documented yet;
>> the closest I found is:
>> http://www.riptano.com/docs/0.6.5/install/auth-config
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Authentication-td5285013.html#a5285013
>>
>> I did start cassandra with the args mentioned above:
>>
>> bin/cassandra -Dpasswd.properties=mypasswd.properties
>> -Daccess.properties=myaccess.properties -f
>
> Try
> http://www.riptano.com/docs/0.6.5/install/storage-config#Authenticator
>
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: TimeUUID makes me crazy

2010-10-19 Thread Sylvain Lebresne
In you first column family, you are using a UUID as a row key (your column
names are strings apparently (phone, addres)). The CompareWith directive
specify the comparator for *column names*. So you are providing strings where
you indicated Cassandra you'll provide UUID, hence the exceptions.

The sorting of rows is determined by the partitioner (and there is no
support for
TimeUUID sorting of rows).

--
Sylvain

On Mon, Oct 18, 2010 at 6:25 PM, cbert...@libero.it  wrote:
> I am getting crazy using TimeUUID in cassandra via Java. I've read the FAQ but
> it didn't help.
> Can I use a TimeUUID as ROW identifier? (if converted to string)
>
> I have a CF like this and SCF like these:
>
> 
> TIMEUUID OPECID (ROW) {
>             phone: 123
>             address: street xyz
> }
>
>  CompareSubcolumnsWith="BytesType" />
> String USERID (ROW) {
>            TIMEUUID OPECID (SuperColumnName)  {
>                                collection of columns;
>             }
> }
>
> In one situation the TimeUUID is a ROW identifier while in another is the
> SuperColumn name. I get many "UUID must be a 16 byte" when I try to read a 
> data
> that did not give any exception during his save.
>
> at a Time T0 this one works: mutator.writeColumns(UuidHelper.timeUuidFromBytes
> (OpecID).toString(), opecfamily, notNull); // (notnull contains a list of
> columns also opecstatus)
>
> Immediately after this one raise an exception: selector.getColumnFromRow
> (UuidHelper.timeUuidFromBytes(OpecID).toString(), opecfamily, "opecstatus",
> ConsistencyLevel.ONE)
>
> I hope that someone help me understanding it ...
>
>


Re: Preventing an update of a CF row

2010-10-19 Thread Sylvain Lebresne
> Always specify some constant value for timestamp. Only 1st insertion with that
> timestamp will succeed. Others will be ignored, because will be considered
> duplicates by cassandra.

Well, that's not entirely true. When cassandra 'resolves' two columns
having the
same timestamp, it will compare the values to decide which one to keep
(and it'll
keep the one column whose value is greater for bytes comparison).
Concretely, if you insert Column('foo', 'b', 0) and then Column('foo',
'bar', 0), then
you'll end up with the second column, even though the timestamps are the same
because 'bar' > 'b'.
So constant timestamps doesn't work for preventing updates of a given column.

--
Sylvain


R: Re: TimeUUID makes me crazy

2010-10-19 Thread cbert...@libero.it
I am using Pelops for Cassandra 0.6.x
The error that raise isInvalidRequestException(why:UUIDs must be exactly 16 
bytes)
For the UUID I am using the UuidHelper class provided.