Re: Cassandra gem

2010-08-17 Thread Mark

 On 8/17/10 5:44 PM, Benjamin Black wrote:

Updated code is now in my master branch, with the reversion to 10.0.0.
  Please let me know of further trouble.


b

On Tue, Aug 17, 2010 at 8:31 AM, Mark  wrote:

  On 8/16/10 11:37 PM, Benjamin Black wrote:

I'm testing with the default cassandra.yaml.

I cannot reproduce the output in that gist, however:


thrift_client = client.instance_variable_get(:@client)

=>nil
Also, the Thrift version for 0.7 is 11.0.0, according to the code I
have.  Can someone comment on whether 0.7 beta1 is at Thrift interface
version 10.0.0 or 11.0.0?


b

On Mon, Aug 16, 2010 at 9:03 PM, Markwrote:

  On 8/16/10 8:51 PM, Mark wrote:

  On 8/16/10 6:19 PM, Benjamin Black wrote:

client = Cassandra.new('system', '127.0.0.1:9160')

Brand new download of beta-0.7.0-beta1

http://gist.github.com/528357

Which thrift/thrift_client versions are you using?

FYI also tested similar setup on another machine and same results. Is
there
any configuration change I need in cassandra.yaml or something?


thrift_client = client.instance_variable_get(:@client)

The above client will only be instantiated after making (or attempting in my
case) a request.



Works like a charm. Thanks


Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Chen Xinli
Thanks for your reply.

Cassandra, in our case, is used  for searching purposes not for data
storage.
We can build the cassandra keyspace data daily/weekly when system load is
lower.

We have modified the cassandra code to add a value filter which makes the
data-repair not working.
The value filter, as I say, is to filter the columns of a key, and only the
desired column is returned.
The filter is done in local cassandra, not in thrift client; So we have to
disable data-repair.

Cassandra has met most of our needs except that:
if a node fails, after a while, recovers, joins the cluster and doing hinted
handoff, then a reading is forward to this node, the data returned is out of
date.

The node failure is not frequently; if it happens unfortunately, we should
keep the reading consitency.


2010/8/18 Benjamin Black 

> On Tue, Aug 17, 2010 at 7:55 PM, Chen Xinli  wrote:
> > Hi,
> >
> > We are going to use cassandra for searching purpose like inbox search.
> > The reading qps is very high, we'd like to use ConsitencyLevel.One for
> > reading and disable read-repair at the same time.
> >
>
> In 0.7 you can set a probability for read repair, but disabling it is
> a spectacularly bad idea.  Any write problems on a node will result in
> persistent inconsistency.
>
> > For reading consistency in this condition, the writing should use
> > ConsistencyLevel.ALL. But the writing will fail if one node fails.
>
> You are free to read and write with consistency levels where R+W < N,
> it just means you have weaker consistency guarantees.
>
> > We want such a ConsistencyLevel for writing/reading that :
> > 1. writing will success if there is node alive for this key
> > 2. reading will not forward to a node that's just recovered and doing
> hinted
> > handoff
> >
> > So that, if some node fails, others nodes for replica will receive the
> data
> > and surve reading successfully;
> > when the failure node recovers,  it will receive hinted handoff from
> other
> > nodes and it'll not surve reading until hinted handoff is done.
> >
> > Does cassandra support the cases already? or should I modify the code to
> > meet our requirements?
> >
>
> You are phrasing these requirements in terms of a specific
> implementation.  What are your actual consistency goals?  If node
> failure is such a common occurrence in your system, you are going to
> have _numerous_ problems.
>
>
> b
>



-- 
Best Regards,
Chen Xinli


Re: data deleted came back after 9 days.

2010-08-17 Thread Benjamin Black
On Tue, Aug 17, 2010 at 7:49 PM, Zhong Li  wrote:
> Those data were inserted one node, then deleted on a remote node in less
> than 2 seconds. So it is very possible some node lost tombstone when
> connection lost.
> My question, is a ConstencyLevel.ALL read can retrieve lost tombstone back
> instead of repair?
>

No.  Read repair does not replay operations.  You must run nodetool repair.


b


Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Benjamin Black
On Tue, Aug 17, 2010 at 7:55 PM, Chen Xinli  wrote:
> Hi,
>
> We are going to use cassandra for searching purpose like inbox search.
> The reading qps is very high, we'd like to use ConsitencyLevel.One for
> reading and disable read-repair at the same time.
>

In 0.7 you can set a probability for read repair, but disabling it is
a spectacularly bad idea.  Any write problems on a node will result in
persistent inconsistency.

> For reading consistency in this condition, the writing should use
> ConsistencyLevel.ALL. But the writing will fail if one node fails.

You are free to read and write with consistency levels where R+W < N,
it just means you have weaker consistency guarantees.

> We want such a ConsistencyLevel for writing/reading that :
> 1. writing will success if there is node alive for this key
> 2. reading will not forward to a node that's just recovered and doing hinted
> handoff
>
> So that, if some node fails, others nodes for replica will receive the data
> and surve reading successfully;
> when the failure node recovers,  it will receive hinted handoff from other
> nodes and it'll not surve reading until hinted handoff is done.
>
> Does cassandra support the cases already? or should I modify the code to
> meet our requirements?
>

You are phrasing these requirements in terms of a specific
implementation.  What are your actual consistency goals?  If node
failure is such a common occurrence in your system, you are going to
have _numerous_ problems.


b


Re: Videos of the cassandra summit starting to be posted

2010-08-17 Thread samal gorai
thanks Riptano group for ur support in community education.

On Tue, Aug 17, 2010 at 11:15 PM, Jeremy Hanna
wrote:

> The videos of the cassandra summit are starting to be posted, just fyi for
> those who were unable to make it out to SF.
>
> http://www.riptano.com/blog/slides-and-videos-cassandra-summit-2010


Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Chen Xinli
I'm using cassandra 0.6.4; there's a configuration option
DoConsistencyChecksBoolean in storage-conf.xml.
Is't that for read-repair ?

I will do a test for WRITE QUORUM, READ.ONE if it can meet our requirements.

2010/8/18 Edward Capriolo 

> On Tue, Aug 17, 2010 at 10:55 PM, Chen Xinli  wrote:
> > Hi,
> >
> > We are going to use cassandra for searching purpose like inbox search.
> > The reading qps is very high, we'd like to use ConsitencyLevel.One for
> > reading and disable read-repair at the same time.
> >
> > For reading consistency in this condition, the writing should use
> > ConsistencyLevel.ALL. But the writing will fail if one node fails.
> > We want such a ConsistencyLevel for writing/reading that :
> > 1. writing will success if there is node alive for this key
> > 2. reading will not forward to a node that's just recovered and doing
> hinted
> > handoff
> >
> > So that, if some node fails, others nodes for replica will receive the
> data
> > and surve reading successfully;
> > when the failure node recovers,  it will receive hinted handoff from
> other
> > nodes and it'll not surve reading until hinted handoff is done.
> >
> > Does cassandra support the cases already? or should I modify the code to
> > meet our requirements?
> >
> > Thanks for any advices!
> >
> > --
> > Best Regards,
> > Chen Xinli
> >
>
> >>The reading qps is very high, we'd like to use ConsitencyLevel.One for
> reading and disable read-repair at the same time.
> You can not disable read repair, all reads no matter the
> ConsistencyLevel always repair. The CL only controls how many nodes to
> read from before returning data to the client.
>
> These two statements contradict.
> > For reading consistency in this condition, the writing should use
> ConsistencyLevel.ALL. But the writing will fail if one node fails.
> > 1. writing will success if there is node alive for this key
>
> Also regardless of the write ConsistencyLevel all writes are written
> to all nodes. If a target node is down HintedHandoff will queue the
> write up for when the node restarts * in (6.3 you can turn off Hinted
> Handoff)
>
> You may want to WRITE QUORUM, READ.ONE, with RF=3 you would need two
> failed nodes before seeing an UnavailableException. You still have
> pretty strong consistency (depending how you look at it) and fast
> reads.
>
> Check the IRC logs Ben "schooled" me over this a couple of days ago.
>



-- 
Best Regards,
Chen Xinli


Re: cassandra for a inbox search with high reading qps

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 10:55 PM, Chen Xinli  wrote:
> Hi,
>
> We are going to use cassandra for searching purpose like inbox search.
> The reading qps is very high, we'd like to use ConsitencyLevel.One for
> reading and disable read-repair at the same time.
>
> For reading consistency in this condition, the writing should use
> ConsistencyLevel.ALL. But the writing will fail if one node fails.
> We want such a ConsistencyLevel for writing/reading that :
> 1. writing will success if there is node alive for this key
> 2. reading will not forward to a node that's just recovered and doing hinted
> handoff
>
> So that, if some node fails, others nodes for replica will receive the data
> and surve reading successfully;
> when the failure node recovers,  it will receive hinted handoff from other
> nodes and it'll not surve reading until hinted handoff is done.
>
> Does cassandra support the cases already? or should I modify the code to
> meet our requirements?
>
> Thanks for any advices!
>
> --
> Best Regards,
> Chen Xinli
>

>>The reading qps is very high, we'd like to use ConsitencyLevel.One for 
>>reading and disable read-repair at the same time.
You can not disable read repair, all reads no matter the
ConsistencyLevel always repair. The CL only controls how many nodes to
read from before returning data to the client.

These two statements contradict.
> For reading consistency in this condition, the writing should use 
> ConsistencyLevel.ALL. But the writing will fail if one node fails.
> 1. writing will success if there is node alive for this key

Also regardless of the write ConsistencyLevel all writes are written
to all nodes. If a target node is down HintedHandoff will queue the
write up for when the node restarts * in (6.3 you can turn off Hinted
Handoff)

You may want to WRITE QUORUM, READ.ONE, with RF=3 you would need two
failed nodes before seeing an UnavailableException. You still have
pretty strong consistency (depending how you look at it) and fast
reads.

Check the IRC logs Ben "schooled" me over this a couple of days ago.


cassandra for a inbox search with high reading qps

2010-08-17 Thread Chen Xinli
Hi,

We are going to use cassandra for searching purpose like inbox search.
The reading qps is very high, we'd like to use ConsitencyLevel.One for
reading and disable read-repair at the same time.

For reading consistency in this condition, the writing should use
ConsistencyLevel.ALL. But the writing will fail if one node fails.
We want such a ConsistencyLevel for writing/reading that :
1. writing will success if there is node alive for this key
2. reading will not forward to a node that's just recovered and doing hinted
handoff

So that, if some node fails, others nodes for replica will receive the data
and surve reading successfully;
when the failure node recovers,  it will receive hinted handoff from other
nodes and it'll not surve reading until hinted handoff is done.

Does cassandra support the cases already? or should I modify the code to
meet our requirements?

Thanks for any advices!

-- 
Best Regards,
Chen Xinli


Re: data deleted came back after 9 days.

2010-08-17 Thread Zhong Li
Those data were inserted one node, then deleted on a remote node in  
less than 2 seconds. So it is very possible some node lost tombstone  
when connection lost.
My question, is a ConstencyLevel.ALL read can retrieve lost tombstone  
back instead of repair?




On Aug 17, 2010, at 4:11 PM, Ned Wolpert wrote:

(gurus, please check my logic here... I'm trying to validate my  
understanding of this situation.)


Isn't the issue that while a server was disconnected, a delete could  
have occurred, and thus the disconnected server never got the  
'tombstone'?
(http://wiki.apache.org/cassandra/DistributedDeletes)  When it comes  
back, only after it receives the delete request will the data be  
deleted from the reconnected server.  I do not think this happens  
automatically when the server rejoins the cluster, but requires the  
manual repair command.


From my understanding, if the consistency level is greater then the  
number of servers missing that tombstone, you'll get the correct  
data. If its less, then you 'could' get the right or wrong answer.  
So the issue is how often do you need to run repair? If you have a  
ReplicationFactor=3, and you use ConstencyLevel.QUORUM, (2  
responses) then you need to run it after one server fails just to be  
sure. If you can handle some tolerance for this, you can wait a bit  
more before running the repair.


On Tue, Aug 17, 2010 at 12:58 PM, Jeremy Dunck   
wrote:
On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis   
wrote:
> It doesn't have to be disconnected more than GC grace seconds to  
cause

> what you are seeing, it just has to be disconnected at all (thus
> missing delete commands).
>
> Thus you need to be running repair more often than gcgrace, or
> confident that read repair will handle it for you (which clearly is
> not the case for you :).  see
> http://wiki.apache.org/cassandra/Operations

FWIW, the docs there say:
"Remember though that if a node is down longer than your configured
GCGraceSeconds (default: 10 days), it could have missed remove
operations permanently"

So that's probably a source of misunderstanding.



--
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe




Re: Cassandra gem

2010-08-17 Thread Benjamin Black
Updated code is now in my master branch, with the reversion to 10.0.0.
 Please let me know of further trouble.


b

On Tue, Aug 17, 2010 at 8:31 AM, Mark  wrote:
>  On 8/16/10 11:37 PM, Benjamin Black wrote:
>>
>> I'm testing with the default cassandra.yaml.
>>
>> I cannot reproduce the output in that gist, however:
>>
 thrift_client = client.instance_variable_get(:@client)
>>
>> =>  nil
>> Also, the Thrift version for 0.7 is 11.0.0, according to the code I
>> have.  Can someone comment on whether 0.7 beta1 is at Thrift interface
>> version 10.0.0 or 11.0.0?
>>
>>
>> b
>>
>> On Mon, Aug 16, 2010 at 9:03 PM, Mark  wrote:
>>>
>>>  On 8/16/10 8:51 PM, Mark wrote:

  On 8/16/10 6:19 PM, Benjamin Black wrote:
>
> client = Cassandra.new('system', '127.0.0.1:9160')

 Brand new download of beta-0.7.0-beta1

 http://gist.github.com/528357

 Which thrift/thrift_client versions are you using?
>>>
>>> FYI also tested similar setup on another machine and same results. Is
>>> there
>>> any configuration change I need in cassandra.yaml or something?
>>>
>
> thrift_client = client.instance_variable_get(:@client)
>
> The above client will only be instantiated after making (or attempting in my
> case) a request.
>
>


Map/Reduce over Cassandra

2010-08-17 Thread Bill Hastings
Hi All

How performant is M/R on Cassandra when compared to running it on HDFS?
Anyone have any numbers they can share? Specifically how much of data the
M/R job was run against and what was the throughput etc. Any information
would be very helpful.

-- 
Cheers
Bill


Re: Errors on CF with index

2010-08-17 Thread Ed Anuff
Yup, that's it, r986486 on Table.java made the problem go away, talk about
great timing :)

On Tue, Aug 17, 2010 at 2:38 PM, Eric Evans  wrote:

> On Tue, 2010-08-17 at 14:04 -0700, Ed Anuff wrote:
> >
> > I'm finding that once I add an index to a column family that I start
> > getting
> > exceptions as I try to add rows to it.  It works fine if I don't
> > define the
> > column metadata.  Any ideas what would cause this?
> >
> > ERROR 12:44:21,477 Error in ThreadPoolExecutor
> > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException:
> > -70
>
> Looks like https://issues.apache.org/jira/browse/CASSANDRA-1402
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: Errors on CF with index

2010-08-17 Thread Eric Evans
On Tue, 2010-08-17 at 14:04 -0700, Ed Anuff wrote:
> 
> I'm finding that once I add an index to a column family that I start
> getting
> exceptions as I try to add rows to it.  It works fine if I don't
> define the
> column metadata.  Any ideas what would cause this?
> 
> ERROR 12:44:21,477 Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException:
> -70 

Looks like https://issues.apache.org/jira/browse/CASSANDRA-1402

-- 
Eric Evans
eev...@rackspace.com



Errors on CF with index

2010-08-17 Thread Ed Anuff
I'm finding that once I add an index to a column family that I start getting
exceptions as I try to add rows to it.  It works fine if I don't define the
column metadata.  Any ideas what would cause this?

ERROR 12:44:21,477 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: -70
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -70
at org.apache.cassandra.db.Table.apply(Table.java:389)
at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:196)
at
org.apache.cassandra.service.StorageProxy$1.runMayThrow(StorageProxy.java:194)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more


This is what the code looks like that's creating the keyspace, cf, and
index:

ColumnDef cd = new ColumnDef("type".getBytes(), "UTF8Type");
cd.setIndex_name("type");
cd.setIndex_type(IndexType.KEYS);

List columns = new ArrayList();
columns.add(cd);

CfDef cf_def = new CfDef();
cf_def.setKeyspace("Domain_____0001");
cf_def.setName("Entity_Fields");
cf_def.setComparator_type("BytesType");
cf_def.setColumn_metadata(columns);

List cf_defs = new ArrayList();
cf_defs.add(cf_def);

KsDef ks_def = new KsDef();
ks_def.setName("Domain_____0001");
ks_def.setStrategy_class("org.apache.cassandra.locator.SimpleStrategy");
ks_def.setReplication_factor(1);
ks_def.setCf_defs(cf_defs);

thrift_cassandra.system_add_keyspace(ks_def);


Re: indexing rows ordered by int

2010-08-17 Thread Benjamin Black
http://code.google.com/p/redis/wiki/SortedSets

On Tue, Aug 17, 2010 at 12:33 PM, S Ahmed  wrote:
> So when using Redis, how do you go about updating the index?
> Do you serialize changes to the index i.e. when someone votes, you then
> update the index?
> Little confused as to how to go about updating a huge index.
> Say you have 1 million stores, and you want to order by the top votes, how
> would you maintain such an index since they are being constantly voted on.
> On Sun, Aug 15, 2010 at 10:48 PM, Chris Goffinet 
> wrote:
>>
>> Digg is using redis for such a feature as well.  We use it on the MyNews -
>> Top in 24 hours. Since we need timestamp ordering + sorting by how many
>> friends touch a story.
>>
>> -Chris
>>
>> On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:
>>
>> > http://code.google.com/p/redis/
>> >
>> > On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed  wrote:
>> >> For CF that I need to perform range scans on, I create separate CF that
>> >> have
>> >> custom ordering.
>> >> Say a CF holds comments on a story (like comments on a reddit or digg
>> >> story
>> >> post)
>> >> So if I need to order comments by votes, it seems I have to re-index
>> >> every
>> >> time someone votes on a comment (or batch it every x minutes).
>> >>
>> >>
>> >> Right now I think I have to pull all the comments into memory, then
>> >> sort by
>> >> votes, then re-write the index.
>> >> Are there any best-practises for this type of index?
>>
>
>


RE: TTransportException intermittently in 0.7

2010-08-17 Thread March, Andres
No errors in server logs.  Let me know if you have any debug recommendations.  
I'm just starting to set it up.

- Andres

From: Jonathan Ellis [jbel...@gmail.com]
Sent: Tuesday, August 17, 2010 12:44 PM
To: user@cassandra.apache.org
Subject: Re: TTransportException intermittently in 0.7

are there any errors on your server logs?

On Tue, Aug 17, 2010 at 11:46 AM, Andres March  wrote:
> We are testing bulk data loads using thrift.  About 5% of operations are
> failing on the following exception.  It appears that it is not getting any
> response (end of file) on the batch mutate response.  I'll try to create a
> test case to demonstrate the behavior.
>
> Caused by: org.apache.thrift.transport.TTransportException
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905)
> at
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)
>
> This is using a 0.7 SNAPSHOT built last friday with a framed transport.
> --
> Andres March
> ama...@qualcomm.com
> Qualcomm Internet Services



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: data deleted came back after 9 days.

2010-08-17 Thread Ned Wolpert
(gurus, please check my logic here... I'm trying to validate my
understanding of this situation.)

Isn't the issue that while a server was disconnected, a delete could have
occurred, and thus the disconnected server never got the 'tombstone'?
(http://wiki.apache.org/cassandra/DistributedDeletes)  When it comes back,
only after it receives the delete request will the data be deleted from the
reconnected server.  I do not think this happens automatically when the
server rejoins the cluster, but requires the manual repair command.

>From my understanding, if the consistency level is greater then the number
of servers missing that tombstone, you'll get the correct data. If its less,
then you 'could' get the right or wrong answer. So the issue is how often do
you need to run repair? If you have a ReplicationFactor=3, and you use
ConstencyLevel.QUORUM, (2 responses) then you need to run it after one
server fails just to be sure. If you can handle some tolerance for this, you
can wait a bit more before running the repair.

On Tue, Aug 17, 2010 at 12:58 PM, Jeremy Dunck  wrote:

> On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis  wrote:
> > It doesn't have to be disconnected more than GC grace seconds to cause
> > what you are seeing, it just has to be disconnected at all (thus
> > missing delete commands).
> >
> > Thus you need to be running repair more often than gcgrace, or
> > confident that read repair will handle it for you (which clearly is
> > not the case for you :).  see
> > http://wiki.apache.org/cassandra/Operations
>
> FWIW, the docs there say:
> "Remember though that if a node is down longer than your configured
> GCGraceSeconds (default: 10 days), it could have missed remove
> operations permanently"
>
> So that's probably a source of misunderstanding.
>



-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe


Re: data deleted came back after 9 days.

2010-08-17 Thread Jeremy Dunck
On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis  wrote:
> It doesn't have to be disconnected more than GC grace seconds to cause
> what you are seeing, it just has to be disconnected at all (thus
> missing delete commands).
>
> Thus you need to be running repair more often than gcgrace, or
> confident that read repair will handle it for you (which clearly is
> not the case for you :).  see
> http://wiki.apache.org/cassandra/Operations

FWIW, the docs there say:
"Remember though that if a node is down longer than your configured
GCGraceSeconds (default: 10 days), it could have missed remove
operations permanently"

So that's probably a source of misunderstanding.


Re: data deleted came back after 9 days.

2010-08-17 Thread Jonathan Ellis
It doesn't have to be disconnected more than GC grace seconds to cause
what you are seeing, it just has to be disconnected at all (thus
missing delete commands).

Thus you need to be running repair more often than gcgrace, or
confident that read repair will handle it for you (which clearly is
not the case for you :).  see
http://wiki.apache.org/cassandra/Operations

On Tue, Aug 17, 2010 at 11:35 AM, Zhong Li  wrote:
> 864000
> It is default  10 days.
>
> I checked all system.log, all nodes are connected, although not all the
> time, but they reconnected after a few minutes. None of node disconnected
> more than GC grace seconds.
>
> Best,
>
> On Aug 17, 2010, at 11:53 AM, Peter Schuller wrote:
>
 We have 10 nodes cross  5 datacenters. Today I found a strange thing. On
 one node, few data deleted came back after 8-9 days.

 The data saved on a node and retrieved/deleted on another node in a
 remote
 datacenter. The CF is a super column.

 What is possible causing this?
>>
>> What is your GC grace seconds set to? Is it lower than 8-9 days, and
>> is it possible one or more nodes were disconnected from the remainder
>> of the cluster for a period longer than the GC grace seconds?
>>
>> See: http://wiki.apache.org/cassandra/DistributedDeletes
>>
>> --
>> / Peter Schuller
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: move data between clusters

2010-08-17 Thread Jonathan Ellis
you can either use get_range_slices to scan through all your rows and
batch_mutate them into the 2nd cluster, or you can start a test
cluster with the same number of nodes as the live one and just scp
everything over, 1 to 1.

it's possible but highly error-prone to manually slice and dice data
files (raw or as json).

On Tue, Aug 17, 2010 at 12:48 PM, Artie Copeland  wrote:
> what is the best way to move data between clusters.  we currently have a 4
> node prod cluster with 80G of data and want to move it to a dev env with 3
> nodes.  we have plenty of disk were looking into nodetool snapshot, but it
> look like that wont work because of the system tables.  sstabletojson does
> look like it would work as it would miss the index files.  am i missing
> something?  have others tried to do the same and been successful.
> thanx
> artie
>
> --
> http://yeslinux.org
> http://yestech.org
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: TTransportException intermittently in 0.7

2010-08-17 Thread Jonathan Ellis
are there any errors on your server logs?

On Tue, Aug 17, 2010 at 11:46 AM, Andres March  wrote:
> We are testing bulk data loads using thrift.  About 5% of operations are
> failing on the following exception.  It appears that it is not getting any
> response (end of file) on the batch mutate response.  I'll try to create a
> test case to demonstrate the behavior.
>
> Caused by: org.apache.thrift.transport.TTransportException
>     at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>     at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>     at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
>     at
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905)
>     at
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)
>
> This is using a 0.7 SNAPSHOT built last friday with a framed transport.
> --
> Andres March
> ama...@qualcomm.com
> Qualcomm Internet Services



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: indexing rows ordered by int

2010-08-17 Thread S Ahmed
So when using Redis, how do you go about updating the index?

Do you serialize changes to the index i.e. when someone votes, you then
update the index?

Little confused as to how to go about updating a huge index.

Say you have 1 million stores, and you want to order by the top votes, how
would you maintain such an index since they are being constantly voted on.

On Sun, Aug 15, 2010 at 10:48 PM, Chris Goffinet wrote:

> Digg is using redis for such a feature as well.  We use it on the MyNews -
> Top in 24 hours. Since we need timestamp ordering + sorting by how many
> friends touch a story.
>
> -Chris
>
> On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:
>
> > http://code.google.com/p/redis/
> >
> > On Sat, Aug 14, 2010 at 11:51 PM, S Ahmed  wrote:
> >> For CF that I need to perform range scans on, I create separate CF that
> have
> >> custom ordering.
> >> Say a CF holds comments on a story (like comments on a reddit or digg
> story
> >> post)
> >> So if I need to order comments by votes, it seems I have to re-index
> every
> >> time someone votes on a comment (or batch it every x minutes).
> >>
> >>
> >> Right now I think I have to pull all the comments into memory, then sort
> by
> >> votes, then re-write the index.
> >> Are there any best-practises for this type of index?
>
>


Re: cache sizes using percentages

2010-08-17 Thread Ryan King
On Tue, Aug 17, 2010 at 10:55 AM, Artie Copeland  wrote:
> if i set a key cache size of 100% the way i understand how that works is:
> - the cache is not write through, but read through
> - a key gets added to the cache on the first read if not already available
> - the size of the cache will always increase for ever item read.  so if you
> have 100mil items your key cache will grow to 100mil
> Here are my questions:
> if that is the case then what happens if you only have enough mem to store
> 10mil items in your key cache?

Then don't use a percentage.

> do you lose the other 90% how is it determined what is removed?

second-chance fifo.

> will the server keep adding til it gets OOM?

that or a gc storm

> if you add a row cache as well how does that affect your percentage?
> if there a priority between the cache? or are they independant so both will
> try to be satisfied which would result in an OOM?

they are independent

-ryan


Re: cache sizes using percentages

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 1:55 PM, Artie Copeland  wrote:
> if i set a key cache size of 100% the way i understand how that works is:
> - the cache is not write through, but read through
> - a key gets added to the cache on the first read if not already available
> - the size of the cache will always increase for ever item read.  so if you
> have 100mil items your key cache will grow to 100mil
> Here are my questions:
> if that is the case then what happens if you only have enough mem to store
> 10mil items in your key cache?
> do you lose the other 90% how is it determined what is removed?
> will the server keep adding til it gets OOM?
> if you add a row cache as well how does that affect your percentage?
> if there a priority between the cache? or are they independant so both will
> try to be satisfied which would result in an OOM?
> thanx,
> artie
> --
> http://yeslinux.org
> http://yestech.org
>

Artie,

In my experience, what ends up happening.. You start your server and
all is well, your cache builds up, cache hit rate keeps climbing! Of
course so does memory usage. At some point you start reaching your
XMX. Java keeps trying to garbage collect often. A couple things can
happen, all of them bad. One is just hitting an OOM. Another thing
that can happen is the JVM spends too much time garbage collection and
so little time processing its throws another exception (might be a
subtype of OOM).

 do you lose the other 90% how is it determined what is removed?
Items are removed when full is reached actual memory usage is NOT
taken into account.

if you add a row cache as well how does that affect your percentage?
Mutually exclusive.

> if there a priority between the cache?
No


Re: move data between clusters

2010-08-17 Thread Benjamin Black
without answering your whole question, just fyi: there is a matching
json2sstable command for going the other direction.

On Tue, Aug 17, 2010 at 10:48 AM, Artie Copeland  wrote:
> what is the best way to move data between clusters.  we currently have a 4
> node prod cluster with 80G of data and want to move it to a dev env with 3
> nodes.  we have plenty of disk were looking into nodetool snapshot, but it
> look like that wont work because of the system tables.  sstabletojson does
> look like it would work as it would miss the index files.  am i missing
> something?  have others tried to do the same and been successful.
> thanx
> artie
>
> --
> http://yeslinux.org
> http://yestech.org
>


cache sizes using percentages

2010-08-17 Thread Artie Copeland
if i set a key cache size of 100% the way i understand how that works is:

- the cache is not write through, but read through
- a key gets added to the cache on the first read if not already available
- the size of the cache will always increase for ever item read.  so if you
have 100mil items your key cache will grow to 100mil

Here are my questions:

if that is the case then what happens if you only have enough mem to store
10mil items in your key cache?
do you lose the other 90% how is it determined what is removed?
will the server keep adding til it gets OOM?
if you add a row cache as well how does that affect your percentage?
if there a priority between the cache? or are they independant so both will
try to be satisfied which would result in an OOM?

thanx,
artie

-- 
http://yeslinux.org
http://yestech.org


move data between clusters

2010-08-17 Thread Artie Copeland
what is the best way to move data between clusters.  we currently have a 4
node prod cluster with 80G of data and want to move it to a dev env with 3
nodes.  we have plenty of disk were looking into nodetool snapshot, but it
look like that wont work because of the system tables.  sstabletojson does
look like it would work as it would miss the index files.  am i missing
something?  have others tried to do the same and been successful.

thanx
artie

-- 
http://yeslinux.org
http://yestech.org


Videos of the cassandra summit starting to be posted

2010-08-17 Thread Jeremy Hanna
The videos of the cassandra summit are starting to be posted, just fyi for 
those who were unable to make it out to SF.

http://www.riptano.com/blog/slides-and-videos-cassandra-summit-2010

TTransportException intermittently in 0.7

2010-08-17 Thread Andres March
 We are testing bulk data loads using thrift.  About 5% of operations 
are failing on the following exception.  It appears that it is not 
getting any response (end of file) on the batch mutate response.  I'll 
try to create a test case to demonstrate the behavior.


Caused by: org.apache.thrift.transport.TTransportException
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905)
at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)


This is using a 0.7 SNAPSHOT built last friday with a framed transport.
--
*Andres March*
ama...@qualcomm.com 
Qualcomm Internet Services


Re: data deleted came back after 9 days.

2010-08-17 Thread Zhong Li

864000
It is default  10 days.

I checked all system.log, all nodes are connected, although not all  
the time, but they reconnected after a few minutes. None of node  
disconnected more than GC grace seconds.


Best,

On Aug 17, 2010, at 11:53 AM, Peter Schuller wrote:

We have 10 nodes cross  5 datacenters. Today I found a strange  
thing. On

one node, few data deleted came back after 8-9 days.

The data saved on a node and retrieved/deleted on another node in  
a remote

datacenter. The CF is a super column.

What is possible causing this?


What is your GC grace seconds set to? Is it lower than 8-9 days, and
is it possible one or more nodes were disconnected from the remainder
of the cluster for a period longer than the GC grace seconds?

See: http://wiki.apache.org/cassandra/DistributedDeletes

--
/ Peter Schuller




Re: data deleted came back after 9 days.

2010-08-17 Thread Peter Schuller
>> We have 10 nodes cross  5 datacenters. Today I found a strange thing. On
>> one node, few data deleted came back after 8-9 days.
>>
>> The data saved on a node and retrieved/deleted on another node in a remote
>> datacenter. The CF is a super column.
>>
>> What is possible causing this?

What is your GC grace seconds set to? Is it lower than 8-9 days, and
is it possible one or more nodes were disconnected from the remainder
of the cluster for a period longer than the GC grace seconds?

See: http://wiki.apache.org/cassandra/DistributedDeletes

-- 
/ Peter Schuller


Re: data deleted came back after 9 days.

2010-08-17 Thread Zhong Li

Cassandra version is 0.6.3

On Aug 17, 2010, at 11:39 AM, Zhong Li wrote:


Hi All,

We have strange issue here.

We have 10 nodes cross  5 datacenters. Today I found a strange  
thing. On one node, few data deleted came back after 8-9 days.


The data saved on a node and retrieved/deleted on another node in a  
remote datacenter. The CF is a super column.


What is possible causing this?

Thanks,

Zhong Li






data deleted came back after 9 days.

2010-08-17 Thread Zhong Li

Hi All,

We have strange issue here.

We have 10 nodes cross  5 datacenters. Today I found a strange thing.  
On one node, few data deleted came back after 8-9 days.


The data saved on a node and retrieved/deleted on another node in a  
remote datacenter. The CF is a super column.


What is possible causing this?

Thanks,

Zhong Li




Re: Cassandra gem

2010-08-17 Thread Mark

 On 8/16/10 11:37 PM, Benjamin Black wrote:

I'm testing with the default cassandra.yaml.

I cannot reproduce the output in that gist, however:


thrift_client = client.instance_variable_get(:@client)

=>  nil
Also, the Thrift version for 0.7 is 11.0.0, according to the code I
have.  Can someone comment on whether 0.7 beta1 is at Thrift interface
version 10.0.0 or 11.0.0?


b

On Mon, Aug 16, 2010 at 9:03 PM, Mark  wrote:

  On 8/16/10 8:51 PM, Mark wrote:

  On 8/16/10 6:19 PM, Benjamin Black wrote:

client = Cassandra.new('system', '127.0.0.1:9160')

Brand new download of beta-0.7.0-beta1

http://gist.github.com/528357

Which thrift/thrift_client versions are you using?

FYI also tested similar setup on another machine and same results. Is there
any configuration change I need in cassandra.yaml or something?



thrift_client = client.instance_variable_get(:@client)

The above client will only be instantiated after making (or attempting in my 
case) a request.



Re: Cassandra and Pig

2010-08-17 Thread Christian Decker
Ok, by now it's getting very strange. I deleted the entire installation and
restarted from scratch and now I'm getting a similar error even though I'm
going through the pig_cassandra script.

2010-08-17 15:54:10,049 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-08-17 15:55:10,032 [Thread-10] INFO
 org.apache.cassandra.config.DatabaseDescriptor - Auto DiskAccessMode
determined to be standard
2010-08-17 15:55:24,652 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201008111350_0020
2010-08-17 15:55:24,652 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://hmaster:50030/jobdetails.jsp?jobid=job_201008111350_0020
2010-08-17 15:56:05,690 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 33% complete
2010-08-17 15:56:09,874 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-08-17 15:56:09,874 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2010-08-17 15:56:10,261 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2010-08-17 15:56:10,351 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backed error: Error:
java.lang.ClassNotFoundException: org.apache.thrift.TBase


which is a bit different from my original error, but on the backend I get a
classic ClassNotFoundException.

Any ideas?
--
Christian Decker
Software Architect
http://blog.snyke.net


How to use Secondary Indices 0.7.0beta1

2010-08-17 Thread Thorvaldsson Justus
I figured some but I am stuck, would appreciate help a lot to understand how to 
use secondary indices.

Create a Column family and define the secondary indices
"
CfDef cdef = new CfDef();
cdef.setColumn_type(columntype);
cdef.setComment(comment);
cdef.setComparator_type(comparatortype);
cdef.setKey_cache_size(key_cache_size);
cdef.setRow_cache_size(row_cache_size);
cdef.setSubcomparator_type(subcomparatortype);
cdef.setKeyspace(keyspacename);
cdef.setName(columnname);

ColumnDef cd= new ColumnDef();
/*
What pieces of information do I need to set here?
cd.setIndex_name("AnyName?");
cd.setName(?);
there is some more information I can set
validation_class ?
IndexType ?
setFieldValue ?
I am having problems understanding them and don't know if I even need to.
*/

cdef.setColumn_metadata(new ArrayList());
cdef.addToColumn_metadata(cd);

system_add_column_family(cdef);
"
Is this all I need to do to make it work? I am not able to test it because I 
have one more problem, I don't understand how to set SlicePredicate, KeyRange, 
SliceRange to search by another index. This is basically API questions and I 
thought when I figured this out I give it a shot to document how to use it.
Any help is appreciated.
/Justus
AB SVENSKA SPEL
106 10 Stockholm
Sturegatan 11, Sundbyberg
Växel +46 8 757 77 00
http://svenskaspel.se