date:20100902

On Thu, Sep 2, 2010 at 8:19 PM, Ying Tang  wrote:
> Recently , i read the paper about Cassandra again .
> And now i have some concepts about  the reading and writing .
> We all know Cassandra uses NWR ,
> When read :
> the request ---> a random node in Cassandra .This node acts as a proxy ,and
> it routes the request.
> Here ,
> 1. the proxy node route this request to this key's coordinator , the
> coordinator then routes request to other N-1 nodes   OR   the proxy routes
> the read request to N nodes ?

The coordinator node is the proxy node.

> 2. If it is the former situation , the read repair occurs on the  key's
> coordinator ?
>    If  it is the latter , the  read repair occurs on the proxy node ?

Depends on the CL requested.  QUORUM and ALL cause the RR to be
performed by the coordinator.  ANY and ONE cause RR to be delegated to
one of the replicas for the key.

> When write :
> the request ---> a random node in Cassandra .This node acts as a proxy ,and
> it routes the request.
> Here ,
> 3. the proxy node route this request to this key's coordinator , the
> coordinator then routes request to other N-1 nodes   OR   the proxy routes
> the request to N nodes ?
>

For writes, the coordinator sends the writes directly to the replicas
regardless of CL (rather than delegating for weakly consistent CLs).

> 4. The N isn't the data's copy numbers , it's just a  range . In this  N
> range , there must be W copies .So W is the copy numbers.
> So in this N range , R+W>N can guarantee the data's validity. Right?
>

Sorry, I can't even parse this.


b

Re: question about Cassandra error

You seem to be typing 0.7 commands on a 0.6 cli.  Please follow the
README in the version you are using, e.g.:

set Keyspace1.Standard2['jsmith']['first'] = 'John'

On Thu, Sep 2, 2010 at 5:35 PM, Simon Chu  wrote:
> I downloaded cassendra 0.6.5 and ran it, got this error:
>
> bin/cassandra -f
>  INFO 16:46:06,198 JNA not found. Native methods will be disabled.
>  INFO 16:46:06,875 DiskAccessMode 'auto' determined to be mmap,
> indexAccessMode is mmap
>
> is this an issue?
>
> When I tried to run cassandra cli from the example, I got the following
> errors:
>
> cassandra> use Keyspace1 sc 'blah$'
> line 1:0 no viable alternative at input 'use'
> Invalid Statement (Type: 0)
> cassandra> set Standard2['jsmith']['first'] = 'John';
> line 1:13 mismatched input '[' expecting DOT
>
> is this a setup issue?
>
> Simon

Re: the process of reading and writing

2010-09-02 Thread Ying Tang

Hi Aaron
Thanks for your reply.

In you text , does the coordinator means the random client that user send
request to ?
Do you mean no matter how many W is assigned to , the data will copy on N
node ? Just the client will think this write action is successful when W
nodes are be written ?

Ps. The key coordinator doesn't mean a single node that is responsible for
all nodes's key range . The key coordinator is the primary node that is
responsible for a key range . If a key is in its range , this node will be
this key's coordinator.


On Fri, Sep 3, 2010 at 2:36 PM, Aaron Morton wrote:

> AKAIK,
> For read the coordinator sends the request to the number of nodes specified
> in the RF. RR is kicked off on the coordinator node after the read has
> completed. There is no key coordinator, what would you do if it as down ?
> The first node in the list of replication nodes is considered special, but
> not that special. (In a normal read only the first node is asked for the
> data, others nodes are asked for a digest)
>
> write same as read. One hop from the coordinator node to the nodes that
> will do the write. The one hop part is discussed in the paper.
>
> N is the number of copies of the data that will be stored. W is the
> consistency level the client is happy to accept to say that the write has
> succeed, after W have ack'd to the coordinator it will ack to the client.
> But it's more complicated that that, search the archives for a big
> discussion on Handed Hint Off
>
> If you client always operates such that R+W>N you have consistency. If you
> drop the R down to 1 you may read data that is not consitent with the other
> nodes in the ring, because the coordinator returns to as soon as the first
> node does. It will then look at the result from the other nodes and kick off
> the Read Repair is needed. But this is after your read request has
> completed.
>
> Aaron
>
>
>
> On 03 Sep, 2010,at 03:19 PM, Ying Tang  wrote:
>
> Recently , i read the paper about Cassandra again .
> And now i have some concepts about  the reading and writing .
>
> We all know Cassandra uses NWR ,
> When read :
> the request ---> a random node in Cassandra .This node acts as a proxy ,and
> it routes the request.
> Here ,
> 1. the proxy node route this request to this key's coordinator , the
> coordinator then routes request to other N-1 nodes   OR   the proxy routes
> the read request to N nodes ?
> 2. If it is the former situation , the read repair occurs on the  key's
> coordinator ?
>If  it is the latter , the  read repair occurs on the proxy node ?
>
> When write :
> the request ---> a random node in Cassandra .This node acts as a proxy ,and
> it routes the request.
> Here ,
> 3. the proxy node route this request to this key's coordinator , the
> coordinator then routes request to other N-1 nodes   OR   the proxy routes
> the request to N nodes ?
>
>
> 4. The N isn't the data's copy numbers , it's just a  range . In this  N
> range , there must be W copies .So W is the copy numbers.
> So in this N range , R+W>N can guarantee the data's validity. Right?
>
>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>
>


-- 
Best regards,

Ivy Tang

Re: the process of reading and writing

2010-09-02 Thread Aaron Morton

AKAIK, For read the coordinator sends the request to the number of nodes specified in the RF. RR is kicked off on the coordinator node after the read has completed. There is no key coordinator, what would you do if it as down ? The first node in the list of replication nodes is considered special, but not that special. (In a normal read only the first node is asked for the data, others nodes are asked for a digest)write same as read. One hop from the coordinator node to the nodes that will do the write. The one hop part is discussed in the paper.N is the number of copies of the data that will be stored. W is the consistency level the client is happy to accept to say that the write has succeed, after W have ack'd to the coordinator it will ack to the client. But it's more complicated that that, search the archives for a big discussion on Handed Hint Off If you client always operates such that R+W>N you have consistency. If you drop the R down to 1 you may read data that is not consitent with the other nodes in the ring, because the coordinator returns to as soon as the first node does. It will then look at the result from the other nodes and kick off the Read Repair is needed. But this is after your read request has completed. Aaron On 03 Sep, 2010,at 03:19 PM, Ying Tang wrote:Recently , i read the paper about Cassandra again . And now i have some concepts about the reading and writing . We all know Cassandra uses NWR ,When read :the request ---> a random node in Cassandra .This node acts as a proxy ,and it routes the request.

Here , 1. the proxy node route this request to this key's coordinator , the coordinator then routes request to other N-1 nodes OR the proxy routes the read request to N nodes ?2. If it is the former situation , the read repair occurs on the key's coordinator ?

If it is the latter , the read repair occurs on the proxy node ?When write :the request ---> a random node in Cassandra .This node acts as a proxy ,and it routes the request.

Here , 3. the proxy node route this request to this key's coordinator , the coordinator then routes request to other N-1 nodes OR the proxy routes the request to N nodes ?

4. The N isn't the data's copy numbers , it's just a range . In this N range , there must be W copies .So W is the copy numbers.So in this N range , R+W>N can guarantee the data's validity. Right?

-- Best regards,Ivy Tang

Re: Is the secondary index maintained synchronously in 0.7

2010-09-02 Thread Alvin UW

Thanks.
But why does this situation happen?
I mean "but not in isolation".
Can we avoid it?

2010/9/2 Jonathan Ellis 

> yes, it is updated atomically (but not in isolation, it's possible for
> a client to see changes to one but not the other temporarily)
>
> On Thu, Sep 2, 2010 at 1:47 PM, Alvin Jin  wrote:
> >
> > Hello,
> >
> > I was thinking the details of the secondary index in 0.7.
> > Will it be updated atomically with its base table?
> >
> > Any explaination the on secondary index is appreciated.
> > Thanks.
> >
> > --
> > View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-the-secondary-index-maintained-synchronously-in-0-7-tp5492798p5492798.html
> > Sent from the cassandra-u...@incubator.apache.org mailing list archive
> at Nabble.com.
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: 4k keyspaces... Maybe we're doing it wrong?

2010-09-02 Thread Aaron Morton

Create one big happy love in keyspace. Use the key structure to identify the different clients data. The is more support for multi tenancy systems but a lot of the memory configuration is per keyspace/column family, so you cannot run that many keyspaces. This page has some more information http://wiki.apache.org/cassandra/MultiTenantAaronOn 03 Sep, 2010,at 01:25 PM, Mike Peters  wrote:  Hi,

We're in the process of migrating 4,000 MySQL client databases to 
Cassandra.  All database schemas are identical.

With MySQL, we used to provision a separate 'database' per each client, 
to make it easier to shard and move things around.

Does it make sense to migrate the 4,000 MySQL databases to 4,000 
keyspaces in Cassandra?  Or should we stick with a single keyspace?

My concerns are -
#1. Will every single node end up with 4k folders under /cassandra/data/?

#2. Performance: Will Cassandra work better with a single keyspace + 
lots of keys, or thousands of keyspaces?

-

Granted it's 'cleaner' to have a separate keyspace per each client, but 
maybe that's not the best approach with Cassandra.

Thoughts?

the process of reading and writing

2010-09-02 Thread Ying Tang

Recently , i read the paper about Cassandra again .
And now i have some concepts about  the reading and writing .

We all know Cassandra uses NWR ,
When read :
the request ---> a random node in Cassandra .This node acts as a proxy ,and
it routes the request.
Here ,
1. the proxy node route this request to this key's coordinator , the
coordinator then routes request to other N-1 nodes   OR   the proxy routes
the read request to N nodes ?
2. If it is the former situation , the read repair occurs on the  key's
coordinator ?
   If  it is the latter , the  read repair occurs on the proxy node ?

When write :
the request ---> a random node in Cassandra .This node acts as a proxy ,and
it routes the request.
Here ,
3. the proxy node route this request to this key's coordinator , the
coordinator then routes request to other N-1 nodes   OR   the proxy routes
the request to N nodes ?


4. The N isn't the data's copy numbers , it's just a  range . In this  N
range , there must be W copies .So W is the copy numbers.
So in this N range , R+W>N can guarantee the data's validity. Right?




-- 
Best regards,

Ivy Tang

Re: question about Cassandra error

2010-09-02 Thread Stu Hood

JNA is _not_ necessary to use Cassandra, but the server can perform some 
operations more efficiently if JNA is in place.

Not sure what is causing the error you are seeing in the CLI though: those 
statements appear to be valid.

-Original Message-
From: "Mike Peters" 
Sent: Thursday, September 2, 2010 8:27pm
To: user@cassandra.apache.org
Subject: Re: question about Cassandra error

  Simon,

See this page: http://www.riptano.com/blog/whats-new-cassandra-065

"Because of licensing issues , 
we can't distribute JNA with Cassandra, so you must manually add it to 
the Cassandra lib/ directory or otherwise place it on the classpath."

On 9/2/2010 8:35 PM, Simon Chu wrote:
> I downloaded cassendra 0.6.5 and ran it, got this error:
>
> bin/cassandra -f
>  INFO 16:46:06,198 JNA not found. Native methods will be disabled.
>  INFO 16:46:06,875 DiskAccessMode 'auto' determined to be mmap, 
> indexAccessMode is mmap
>
> is this an issue?
>
> When I tried to run cassandra cli from the example, I got the 
> following errors:
>
> cassandra> use Keyspace1 sc 'blah$'
> line 1:0 no viable alternative at input 'use'
> Invalid Statement (Type: 0)
> cassandra> set Standard2['jsmith']['first'] = 'John';
> line 1:13 mismatched input '[' expecting DOT
>
> is this a setup issue?
>
> Simon

Re: question about Cassandra error


 Simon,

See this page: http://www.riptano.com/blog/whats-new-cassandra-065

"Because of licensing issues , 
we can't distribute JNA with Cassandra, so you must manually add it to 
the Cassandra lib/ directory or otherwise place it on the classpath."


On 9/2/2010 8:35 PM, Simon Chu wrote:

I downloaded cassendra 0.6.5 and ran it, got this error:

bin/cassandra -f
 INFO 16:46:06,198 JNA not found. Native methods will be disabled.
 INFO 16:46:06,875 DiskAccessMode 'auto' determined to be mmap, 
indexAccessMode is mmap


is this an issue?

When I tried to run cassandra cli from the example, I got the 
following errors:


cassandra> use Keyspace1 sc 'blah$'
line 1:0 no viable alternative at input 'use'
Invalid Statement (Type: 0)
cassandra> set Standard2['jsmith']['first'] = 'John';
line 1:13 mismatched input '[' expecting DOT

is this a setup issue?

Simon

4k keyspaces... Maybe we're doing it wrong?


 Hi,

We're in the process of migrating 4,000 MySQL client databases to 
Cassandra.  All database schemas are identical.


With MySQL, we used to provision a separate 'database' per each client, 
to make it easier to shard and move things around.


Does it make sense to migrate the 4,000 MySQL databases to 4,000 
keyspaces in Cassandra?  Or should we stick with a single keyspace?


My concerns are -
#1. Will every single node end up with 4k folders under /cassandra/data/?

#2. Performance: Will Cassandra work better with a single keyspace + 
lots of keys, or thousands of keyspaces?


-

Granted it's 'cleaner' to have a separate keyspace per each client, but 
maybe that's not the best approach with Cassandra.


Thoughts?

question about Cassandra error

2010-09-02 Thread Simon Chu

I downloaded cassendra 0.6.5 and ran it, got this error:

bin/cassandra -f
 INFO 16:46:06,198 JNA not found. Native methods will be disabled.
 INFO 16:46:06,875 DiskAccessMode 'auto' determined to be mmap,
indexAccessMode is mmap

is this an issue?

When I tried to run cassandra cli from the example, I got the following
errors:

cassandra> use Keyspace1 sc 'blah$'
line 1:0 no viable alternative at input 'use'
Invalid Statement (Type: 0)
cassandra> set Standard2['jsmith']['first'] = 'John';
line 1:13 mismatched input '[' expecting DOT

is this a setup issue?

Simon

Re: Is the secondary index maintained synchronously in 0.7

yes, it is updated atomically (but not in isolation, it's possible for
a client to see changes to one but not the other temporarily)

On Thu, Sep 2, 2010 at 1:47 PM, Alvin Jin  wrote:
>
> Hello,
>
> I was thinking the details of the secondary index in 0.7.
> Will it be updated atomically with its base table?
>
> Any explaination the on secondary index is appreciated.
> Thanks.
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-the-secondary-index-maintained-synchronously-in-0-7-tp5492798p5492798.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Impact on running cassandra cluster from changing hostnames...

2010-09-02 Thread Ned Wolpert

Folks-

  What is the correct process of changing the hostnames and IPs of each
server in a cassandra cluster. In my use-case we're shutting it down and
then changing the names and ips. No changes to hardware during the
processes. Beyond config changes, what should I be concerned about?

-- 
Virtually, Ned Wolpert

"Settle thy studies, Faustus, and begin..."   --Marlowe

Re: Data Center Move

2010-09-02 Thread Anthony Molinaro

Hi,

  Yes we saw that but felt that maybe moving files would be faster than
rerolling a patched version of the server, so we were wondering if we
could move files as described.  If that fails to work for us, we may
try out this patch.

-Anthony

On Thu, Sep 02, 2010 at 01:28:15PM -0500, Peter Fales wrote:
> Anthony,
> 
> I'm just getting my feet wet with Cassandra, so I'm far from an
> expert, but I'm curious whether you saw my posting a few days ago
> about using the EC2 "public" IP addreses with cassandra:
> http://www.mail-archive.com/user@cassandra.apache.org/msg05692.html
> 
> *If* I understand the problem correctly, it seems like you could create
> some new EC2 nodes using this patched version of the code, then 
> migrate your existing nodes to new EC2 nodes, but giving each new node a 
> public IP.   Once your entire EC2 cluster was up and running on the
> public addresses, you should be able to use those public addresses 
> to migrate to some other site outside of EC2.  
> 
> Am I missing something obvious?   (Quite possible, since I haven't actually
> tested this)
> 
> On Thu, Sep 02, 2010 at 01:09:46PM -0500, Anthony Molinaro wrote:
> > Hi,
> > 
> >   We're running cassandra 0.6.4, and need to do a data center move of
> > a cluster (from EC2 to our own data center).   Because of the way the
> > networks are set up we can't actually connect these boxes directly, so
> > the original plan of add some nodes in the new colo, let them bootstrap
> > then decommission nodes in the old colo until the data is all transfered
> > will not work.
> > 
> > So I'm wondering if the following will work
> > 
> > 1. take a snapshot on the source cluster
> > 2. rsync all the files from the old machines to the new machines (we'd most
> >likely be reducing the total number of machines, so would do things like
> >take 4-5 machines worth of data and put it onto 1 machine)
> > 3. bring up the new machines in the new colo
> > 4. run cleanup on all new nodes?
> > 5. run repair on all new nodes?
> > 
> > So will this work?  If so, are steps 4 and 5 correct?
> > 
> > I realize we will miss any new data that happens between the snapshot
> > and turning on writes on the new cluster, but I think we might be able
> > to just tune compaction such that it doesn't happen, then just sync
> > the files that change while the data transfers happen?
> > 
> > Thanks,
> > 
> > -Anthony
> > 
> > -- 
> > 
> > Anthony Molinaro   
> 
> -- 
> Peter Fales
> Alcatel-Lucent
> Member of Technical Staff
> 1960 Lucent Lane
> Room: 9H-505
> Naperville, IL 60566-7033
> Email: peter.fa...@alcatel-lucent.com
> Phone: 630 979 8031

-- 

Anthony Molinaro

Is the secondary index maintained synchronously in 0.7

2010-09-02 Thread Alvin Jin


Hello,

I was thinking the details of the secondary index in 0.7.
Will it be updated atomically with its base table?

Any explaination the on secondary index is appreciated.
Thanks.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-the-secondary-index-maintained-synchronously-in-0-7-tp5492798p5492798.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Data Center Move

You will likely need to rename some of the files to avoid collisions
(they are only unique per node).  Otherwise, yes, this can work.

On Thu, Sep 2, 2010 at 11:09 AM, Anthony Molinaro
 wrote:
> Hi,
>
>  We're running cassandra 0.6.4, and need to do a data center move of
> a cluster (from EC2 to our own data center).   Because of the way the
> networks are set up we can't actually connect these boxes directly, so
> the original plan of add some nodes in the new colo, let them bootstrap
> then decommission nodes in the old colo until the data is all transfered
> will not work.
>
> So I'm wondering if the following will work
>
> 1. take a snapshot on the source cluster
> 2. rsync all the files from the old machines to the new machines (we'd most
>   likely be reducing the total number of machines, so would do things like
>   take 4-5 machines worth of data and put it onto 1 machine)
> 3. bring up the new machines in the new colo
> 4. run cleanup on all new nodes?
> 5. run repair on all new nodes?
>
> So will this work?  If so, are steps 4 and 5 correct?
>
> I realize we will miss any new data that happens between the snapshot
> and turning on writes on the new cluster, but I think we might be able
> to just tune compaction such that it doesn't happen, then just sync
> the files that change while the data transfers happen?
>
> Thanks,
>
> -Anthony
>
> --
> 
> Anthony Molinaro                           
>

Re: Data Center Move

2010-09-02 Thread Peter Fales

Anthony,

I'm just getting my feet wet with Cassandra, so I'm far from an
expert, but I'm curious whether you saw my posting a few days ago
about using the EC2 "public" IP addreses with cassandra:
http://www.mail-archive.com/user@cassandra.apache.org/msg05692.html

*If* I understand the problem correctly, it seems like you could create
some new EC2 nodes using this patched version of the code, then 
migrate your existing nodes to new EC2 nodes, but giving each new node a 
public IP.   Once your entire EC2 cluster was up and running on the
public addresses, you should be able to use those public addresses 
to migrate to some other site outside of EC2.  

Am I missing something obvious?   (Quite possible, since I haven't actually
tested this)

On Thu, Sep 02, 2010 at 01:09:46PM -0500, Anthony Molinaro wrote:
> Hi,
> 
>   We're running cassandra 0.6.4, and need to do a data center move of
> a cluster (from EC2 to our own data center).   Because of the way the
> networks are set up we can't actually connect these boxes directly, so
> the original plan of add some nodes in the new colo, let them bootstrap
> then decommission nodes in the old colo until the data is all transfered
> will not work.
> 
> So I'm wondering if the following will work
> 
> 1. take a snapshot on the source cluster
> 2. rsync all the files from the old machines to the new machines (we'd most
>likely be reducing the total number of machines, so would do things like
>take 4-5 machines worth of data and put it onto 1 machine)
> 3. bring up the new machines in the new colo
> 4. run cleanup on all new nodes?
> 5. run repair on all new nodes?
> 
> So will this work?  If so, are steps 4 and 5 correct?
> 
> I realize we will miss any new data that happens between the snapshot
> and turning on writes on the new cluster, but I think we might be able
> to just tune compaction such that it doesn't happen, then just sync
> the files that change while the data transfers happen?
> 
> Thanks,
> 
> -Anthony
> 
> -- 
> 
> Anthony Molinaro   

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031

Data Center Move

2010-09-02 Thread Anthony Molinaro

Hi,

  We're running cassandra 0.6.4, and need to do a data center move of
a cluster (from EC2 to our own data center).   Because of the way the
networks are set up we can't actually connect these boxes directly, so
the original plan of add some nodes in the new colo, let them bootstrap
then decommission nodes in the old colo until the data is all transfered
will not work.

So I'm wondering if the following will work

1. take a snapshot on the source cluster
2. rsync all the files from the old machines to the new machines (we'd most
   likely be reducing the total number of machines, so would do things like
   take 4-5 machines worth of data and put it onto 1 machine)
3. bring up the new machines in the new colo
4. run cleanup on all new nodes?
5. run repair on all new nodes?

So will this work?  If so, are steps 4 and 5 correct?

I realize we will miss any new data that happens between the snapshot
and turning on writes on the new cluster, but I think we might be able
to just tune compaction such that it doesn't happen, then just sync
the files that change while the data transfers happen?

Thanks,

-Anthony

-- 

Anthony Molinaro

Re: docs about the secondary index?

You can't, yet.  There are examples in
test/system/test_thrift_server.py; look for "index"

(moving to user@)

On Thu, Sep 2, 2010 at 8:20 AM, Changjiu Jin  wrote:
> Hello,
>
>
>
> Where can we find docs about the secondary index?
>
>
>
> Thanks
>
>
>
>
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra on AWS across Regions

On Thu, Sep 2, 2010 at 5:52 AM, Phil Stanhope  wrote:
> Ben, can you elaborate on some infrastructure topology issues that would
> break this approach?
>

As noted, the naive approach results in nodes behind the same NAT
having to communicate with each other through that NAT rather than
directly.  You can different property files for property snitch on
different nodes, as that is directly encoding topology.  You could do
the same with /etc/hosts.  You could do the same with DNS.  The
problem is that in all these cases you have a different view of the
world depending on where you are.  Does this node have the right
information for connecting to local nodes and remote nodes?  Is it
failing to connect to some other node because of a hostname resolution
failure, or because it has the wrong topology information, or ...?

And this only assumes 1:1 NAT.  What is the solution for PAT (which is
quite common)?  It's a deep dark hole of edge cases.  I would rather
have a dead simple 80% solution than a 100% solution with dynamics I
can't understand.

b

Re: Looking for something like "like" of mysql.

2010-09-02 Thread vineet daniel

you can try using different CF for different result sets or inverted index.
but looking at the number of inserts that you have..it will become
complicated. The first thing that you need to do is stop thinking in terms
of any RDBMS as cassandra is not at all like them.
___
Regards
Vineet Daniel
+918106217121
___

Let your email find you


On Thu, Sep 2, 2010 at 10:00 PM, Mike Peters  wrote:

>  Cassandra doesn't support adhoc queries, like what you're describing
>
> I recommend looking at Lucandra 
>
>
> On 9/2/2010 12:27 PM, Anuj Kabra wrote:
>
> I am working with cassandra-0.6.4. I am working on mail retreival problem.
> We have the metadata of mail like sender, recipient, timestamp, subject and
> the location of mail file stored in a cassandra DB.Everyday about 25,000
> records will
>
> be entered to this DB. We have not finalised on the data model yet but
> starting with a simple one having only one column family.
> 
> which have user_id of recipient as key.and columns for sender_id, timestamp
> of mail, subject and location of mail file.
> Now our Use case is to get the locations of all mail files which are being
> sent by a user matching a given subject(can be a part of the original
> subject of mail). Well according to my knowledge till now, we can get all
> the rows of a user
>
> by using user_id as key. After that i need to iterate over all the rows i
> get and see which mail seems to fit the given condition.(matching a subject
> in this case), which is very heavy computationally as we would get thousands
> of rows.
> So we are looking for something like "like" of mysql provided by thrift. I
> also need to know if am going the right way.
> Help is much appreciated.
>
>
>

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-09-02 Thread Mikio Braun

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Carsten,

> In this regard, what I personally miss in Mikios - however nice - analysis, 
> is what are the effects on the application stop times due to any garbage 
> collection runs for the cases tested. In most cases, I prefer having low 
> pauses due to any garbage collection runs and don't care too much about the 
> shape of the memory usage, and I guess, that's the reason why the low pause 
> collector is used by default for running cassandra.

I see your point. I haven't explicitly tested those pauses. You can get
them from the gc logs (with some amount of perl parsing).

Subjectively speaking, I saw a higher probability of timeouts if GC took
too long. One other thing to look out for would be CMS failures (when
you start a CMS cycle but the young generation GCs run out of memory to
promote objects) which then results in a full GC cycle.

Probably I can rerun the tests and save the gc logs as well and put them
somewhere.

- -M


- -- 
Dr. Mikio Braunemail: mi...@cs.tu-berlin.de
TU Berlin  web: ml.cs.tu-berlin.de/~mikio
Franklinstr. 28/29 tel: +49 30 314 78627
10587 Berlin, Germany



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx/0wkACgkQtnXKX8rQtgC7CwCfSHyh4+6mMxKIbcmNCUegeY8P
0cwAnAhQrFKomDJ96P1ZQ3cZowDmrim1
=Lwuj
-END PGP SIGNATURE-

Re: Migrate data from 0.7 pre-release to 0.7 Beta

no.

1. if you delete system folder you'll blow away token information too,
which is not safe on > 1 machine.  just delete the schema CFs
2. yaml is ignored, you need to explicitly run import-from-yaml (see NEWS)

On Thu, Sep 2, 2010 at 9:32 AM, Mike Peters
 wrote:
>  Thanks Jonathan,
>
> Just to make sure I understand, are you suggesting -
>
> 1. Delete system folder
> 2. Add the keyspace&cf definitions to cassandra.yaml
> 3. Restart
>
> That should do it?
>
> On 9/2/2010 12:08 PM, Jonathan Ellis wrote:
>>
>> probably you will have to blow away the system schema CF and re-import
>> from yaml
>>
>> On Thu, Sep 2, 2010 at 7:53 AM, Mike Peters
>>   wrote:
>>>
>>>  Hi,
>>>
>>> Is there a way to migrate data from a 0.7 pre-release build (June 30,
>>> 2010)
>>> to the latest 0.7 beta 1?
>>>
>>> Replacing the binaries and starting-up Cassandra, throws the "are you
>>> upgrading a pre-release version" error and dies.
>>>
>>>
>>> Thanks,
>>> Mike
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Migrate data from 0.7 pre-release to 0.7 Beta


 Thanks Jonathan,

Just to make sure I understand, are you suggesting -

1. Delete system folder
2. Add the keyspace&cf definitions to cassandra.yaml
3. Restart

That should do it?

On 9/2/2010 12:08 PM, Jonathan Ellis wrote:


probably you will have to blow away the system schema CF and re-import from yaml

On Thu, Sep 2, 2010 at 7:53 AM, Mike Peters
  wrote:

  Hi,

Is there a way to migrate data from a 0.7 pre-release build (June 30, 2010)
to the latest 0.7 beta 1?

Replacing the binaries and starting-up Cassandra, throws the "are you
upgrading a pre-release version" error and dies.


Thanks,
Mike

Re: Looking for something like "like" of mysql.


 Cassandra doesn't support adhoc queries, like what you're describing

I recommend looking at Lucandra 

On 9/2/2010 12:27 PM, Anuj Kabra wrote:
I am working with cassandra-0.6.4. I am working on mail retreival 
problem. We have the metadata of mail like sender, recipient, 
timestamp, subject and the location of mail file stored in a cassandra 
DB.Everyday about 25,000 records will


be entered to this DB. We have not finalised on the data model yet but 
starting with a simple one having only one column family.


which have user_id of recipient as key.and columns for sender_id, 
timestamp of mail, subject and location of mail file.
Now our Use case is to get the locations of all mail files which are 
being sent by a user matching a given subject(can be a part of the 
original subject of mail). Well according to my knowledge till now, we 
can get all the rows of a user


by using user_id as key. After that i need to iterate over all the 
rows i get and see which mail seems to fit the given 
condition.(matching a subject in this case), which is very heavy 
computationally as we would get thousands of rows.
So we are looking for something like "like" of mysql provided by 
thrift. I also need to know if am going the right way.

Help is much appreciated.

Looking for something like "like" of mysql.

2010-09-02 Thread Anuj Kabra

I am working with cassandra-0.6.4. I am working on mail retreival problem.
We have the metadata of mail like sender, recipient, timestamp, subject and
the location of mail file stored in a cassandra DB.Everyday about 25,000
records will

be entered to this DB. We have not finalised on the data model yet but
starting with a simple one having only one column family.

which have user_id of recipient as key.and columns for sender_id, timestamp
of mail, subject and location of mail file.
Now our Use case is to get the locations of all mail files which are being
sent by a user matching a given subject(can be a part of the original
subject of mail). Well according to my knowledge till now, we can get all
the rows of a user

by using user_id as key. After that i need to iterate over all the rows i
get and see which mail seems to fit the given condition.(matching a subject
in this case), which is very heavy computationally as we would get thousands
of rows.
So we are looking for something like "like" of mysql provided by thrift. I
also need to know if am going the right way.
Help is much appreciated.

Re: Migrate data from 0.7 pre-release to 0.7 Beta

probably you will have to blow away the system schema CF and re-import from yaml

On Thu, Sep 2, 2010 at 7:53 AM, Mike Peters
 wrote:
>  Hi,
>
> Is there a way to migrate data from a 0.7 pre-release build (June 30, 2010)
> to the latest 0.7 beta 1?
>
> Replacing the binaries and starting-up Cassandra, throws the "are you
> upgrading a pre-release version" error and dies.
>
>
> Thanks,
> Mike
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Migrate data from 0.7 pre-release to 0.7 Beta