Re: Cassandra error with large connection

2010-02-02 Thread JKnight JKnight
Thank you very much, Mr Jonathan.

On Mon, Feb 1, 2010 at 11:04 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Feb 1, 2010 at 10:03 AM, Jonathan Ellis jbel...@gmail.com wrote:
  I see a lot of CLOSE_WAIT TCP connection.

 Also, this sounds like you are not properly pooling client connections
 to casssandra.  You should have one connection per user, not one
 connection per operation.

 -Jonathan




-- 
Best regards,
JKnight


Re: Sample applications

2010-02-02 Thread Erik Holstad
Hi Carlos!

I'm also really new to Cassandra but here are a couple of links that I found
useful:
http://wiki.apache.org/cassandra/ClientExamples
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

and one of the presentations like:
http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod

Erik


RE: Sample applications

2010-02-02 Thread Carlos Sanchez
Thanks Erik

From: Erik Holstad [mailto:erikhols...@gmail.com]
Sent: Tuesday, February 02, 2010 9:08 AM
To: cassandra-user@incubator.apache.org
Subject: Re: Sample applications

Hi Carlos!

I'm also really new to Cassandra but here are a couple of links that I found 
useful:
http://wiki.apache.org/cassandra/ClientExamples
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

and one of the presentations like:
http://www.slideshare.net/jhammerb/data-presentations-cassandra-sigmod

Erik

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.


How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Hi all,

I would like to know how to retrieve the list of available keys available
for a specific column. There is the get_key_range method, but it is only
available when using the OrderPreservingPartitioner -- I use a
RandomPartitioner.

Does this mean that when using a RandomPartitioner, you cannot see which
keys are available in the database ?

 -- Sébastien


Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams dri...@gmail.com wrote:

 On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad erikhols...@gmail.comwrote:

 Hey!
 Have a couple of questions about the best way to use Cassandra.
 Using the random partitioner + the multi_get calls vs order preservation +
 range_slice calls?


 When you use an OPP, the distribution of your keys becomes your problem.
  If you don't have an even distribution, this will be reflected in the load
 on the nodes, while the RP gives you even distribution.


Yeah, that is why it would be nice to hear if anyone has compared the
performance between the two,
to see if it is worth worrying about your own distribution. I also read that
the random partitioner doesn't
give that great distribution.



 What is the benefit of using multiple families vs super column?


 http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I
 prefer simple CFs instead of supercolumns.

Yeah, this is nasty.



 For example in the case of sorting
 in different orders. One good thing that I can see here when using super
 column is that you don't
 have to restart your cluster every time you want to add something new
 order.


 A supercolumn can still only compare subcolumns in a single way.

Yeah, I know that, but you can have a super column per sort order without
having to restart the cluster.


 When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you
 will be able to add CFs without restarting.

Looks interesting, but targeted at 0.7, so it is probably going to be a
little while, or?


 -Brandon




-- 
Regards Erik


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Jonathan Ellis
More or less (but see
https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).

Think of it this way: when you have a few billion keys, how useful is
it to list them?

-Jonathan

2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com:
 Hi all,
 I would like to know how to retrieve the list of available keys available
 for a specific column. There is the get_key_range method, but it is only
 available when using the OrderPreservingPartitioner -- I use a
 RandomPartitioner.
 Does this mean that when using a RandomPartitioner, you cannot see which
 keys are available in the database ?
  -- Sébastien


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Erik Holstad
Hi Sebastien!
I'm totally new to Cassandra, but as far as I know there is no way of
getting just the keys that are in the
database, they are not stored separately but only with the data itself.

Why do you want a list of keys, what are you going to use them for? Maybe
there is another way of solving
your problem.

What you are describing, getting all the keys/rows for a given column sounds
like you have to fetch all the
data that you have and then filter every key on your column, I don't think
that get_key_range will do that for
you even, says that it takes column_family, but like I said I'm totally new

Erik

2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com

 Hi all,

 I would like to know how to retrieve the list of available keys available
 for a specific column. There is the get_key_range method, but it is only
 available when using the OrderPreservingPartitioner -- I use a
 RandomPartitioner.

 Does this mean that when using a RandomPartitioner, you cannot see which
 keys are available in the database ?

  -- Sébastien




-- 
Regards Erik


Re: Best design in Cassandra

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad erikhols...@gmail.com wrote:

 A supercolumn can still only compare subcolumns in a single way.

 Yeah, I know that, but you can have a super column per sort order without
 having to restart the cluster.


You get a CompareWith for the columns, and a CompareSubcolumnsWith for
subcolumns.  If you need more column types to get different sort orders, you
need another ColumnFamily.

-Brandon


Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 7:45 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad erikhols...@gmail.comwrote:

 A supercolumn can still only compare subcolumns in a single way.

 Yeah, I know that, but you can have a super column per sort order without
 having to restart the cluster.


 You get a CompareWith for the columns, and a CompareSubcolumnsWith for
 subcolumns.  If you need more column types to get different sort orders, you
 need another ColumnFamily.

Not sure what column types mean. What I want to do is to have a few things
sorted by asc and desc order, like {a,b}, {b,a} and {1,2}, {2,1}


 -Brandon




-- 
Regards Erik


Re: Did CASSANDRA-647 get fixed in 0.5?

2010-02-02 Thread Omer van der Horst Jansen
Here it is: https://issues.apache.org/jira/browse/CASSANDRA-752




From: Jonathan Ellis jbel...@gmail.com
To: cassandra-user@incubator.apache.org
Sent: Mon, February 1, 2010 5:22:13 PM
Subject: Re: Did CASSANDRA-647 get fixed in 0.5?

Can you create a ticket for this?

Thanks!

On Mon, Feb 1, 2010 at 4:11 PM, Omer van der Horst Jansen
ome...@yahoo.com wrote:
 I checked out the 0.5 branch and ran ant release (on my linux box).
 Installed the new tar.gz and ran the test on my Windows laptop as before but
 got the same result -- the key isn't deleted from the perspective of
 get_range_slice.

 Omer

 
 From: Jonathan Ellis jbel...@gmail.com
 To: cassandra-user@incubator.apache.org
 Sent: Mon, February 1, 2010 4:52:17 PM
 Subject: Re: Did CASSANDRA-647 get fixed in 0.5?

 647 was committed for 0.5, yes, but CASSANDRA-703 was not.  Can you
 try the 0.5 branch and see if it is fixed there?

 On Mon, Feb 1, 2010 at 3:26 PM, Omer van der Horst Jansen
 ome...@yahoo.com wrote:
 I'm running
 into an issue with Cassandra 0.5 (the current release version) that
 sounds exactly like the description of issue CASSANDRA-647.

 I'm
 using the Thrift Java API to store a couple of columns in a single row. A
 few seconds after that my application deletes the entire row. A plain
 Cassandra.Client.get() will then throw a NotFoundException for that
 particular key, as expected. However, the key will still show up when
 executing a
 Cassandra.Client.get_range_slice query.

 Here is some quick and
 dirty Java code that demonstrates the problem:

 import
 java.util.List;

 import org.apache.cassandra.service.*;
 import
 org.apache.thrift.protocol.*;
 import org.apache.thrift.transport.*;

 public class Cassandra647TestApp
 {
/**
 * Demonstrates
 CASSANDRA-647 presence in Cassandra 0.5 release.
 * Requires an
 unmodified Cassandra configuration except that an
 *
 OrderPreservingPartitioner should be used.
 */
public
 static void main(String[] args) throws Exception
{

 String keyspace = Keyspace1;
String cf = Standard1;
String key = testrow1;
byte[] columnName =
 colname.getBytes();
byte[] data = testdata.getBytes();

TTransport transport = new TSocket(localhost, 9160);
TProtocol protocol = new TBinaryProtocol(transport);

 Cassandra.Client client = new Cassandra.Client(protocol);

 transport.open();
ColumnPath path = new ColumnPath(cf, null,
 columnName);

client.insert(keyspace, key, path, data,
 System.currentTimeMillis(),
ConsistencyLevel.ONE);

Thread.sleep(1000);

ColumnPath rowpath = new
 ColumnPath(cf, null, null);

client.remove(keyspace, key,
 rowpath, System.currentTimeMillis(),

 ConsistencyLevel.ONE);
Thread.sleep(1000);

try
{
ColumnOrSuperColumn cosc = client.get(keyspace,
 key, path,
ConsistencyLevel.ONE);

 System.out.println(Whoops! NotFoundException not thrown!);
}
catch (NotFoundException e)
{

 System.out.println(OK, we got a NotFoundException);
}

ColumnParent parent = new ColumnParent(cf, null);

 SlicePredicate predicate = new SlicePredicate();
SliceRange
 range = new SliceRange();
range.start = new byte[0];
range.finish = new byte[0];
predicate.slice_range = range;

ListKeySlice sliceList = client.get_range_slice(keyspace, parent,
predicate, , , 1000,
 ConsistencyLevel.ONE);

for (KeySlice k : sliceList)
{
System.out.println(Found key  + k.key);
if (key.equals(k.key))
{

 System.out.println(but key  + k.key
+ 
 should have been removed);
}
}
}
 }

 Am I using the API correctly in the code above?

 -Omer van der Horst Jansen










  

Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Hi all,

It's basically for knowing what's inside the db, as I've been toying with
Cassandra for some time, I have keys that are no longer useful and should be
removed.

I'm also storing HTTP logs in cassandra, where keys follow this convention
campaign:CAMPAIGN_ID:MMDD. So for instance, if I'd like to know
what logs are available I just have to do:

   client.get_keys(Keyspace1, Logs, , , 100, ConsistencyLevel.ONE)

However, I have to use an OrderPreservingPartitioner to do so, which is
(from my understanding) bad for load in this case.

 -- Sébastien


2010/2/2 Erik Holstad erikhols...@gmail.com

 Hi Sebastien!
 I'm totally new to Cassandra, but as far as I know there is no way of
 getting just the keys that are in the
 database, they are not stored separately but only with the data itself.

 Why do you want a list of keys, what are you going to use them for? Maybe
 there is another way of solving
 your problem.

 What you are describing, getting all the keys/rows for a given column
 sounds like you have to fetch all the
 data that you have and then filter every key on your column, I don't think
 that get_key_range will do that for
 you even, says that it takes column_family, but like I said I'm totally new

 Erik

 2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com

 Hi all,

 I would like to know how to retrieve the list of available keys available
 for a specific column. There is the get_key_range method, but it is only
 available when using the OrderPreservingPartitioner -- I use a
 RandomPartitioner.

 Does this mean that when using a RandomPartitioner, you cannot see which
 keys are available in the database ?

  -- Sébastien




 --
 Regards Erik



Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Sébastien Pierre
Hi Jonathan,

In my case, I'll have much more columns (thousands to millions) than keys in
logs (campaign x days), so it's not an issue to retrieve all of them.

Also, if you assume that you can't retrieve values from Cassandra, just
because you're using the wrong key (say your using user/10 instead of
user:10) without the ability to list the keys, you'd have no way to find
out the error.

I'm glad to see this implemented :)

 -- Sébastien

2010/2/2 Jonathan Ellis jbel...@gmail.com

 More or less (but see
 https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).

 Think of it this way: when you have a few billion keys, how useful is
 it to list them?

 -Jonathan

 2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com:
  Hi all,
  I would like to know how to retrieve the list of available keys available
  for a specific column. There is the get_key_range method, but it is only
  available when using the OrderPreservingPartitioner -- I use a
  RandomPartitioner.
  Does this mean that when using a RandomPartitioner, you cannot see which
  keys are available in the database ?
   -- Sébastien



Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Brandon Williams
2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com

 Hi Jonathan,

 In my case, I'll have much more columns (thousands to millions) than keys
 in logs (campaign x days), so it's not an issue to retrieve all of them.


If that's the case, your dataset is small enough that you could maintain an
index of the keys in another CF.   If it needs to scale further, you can
segment the index keys by year, month, etc.

-Brandon


Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Hey!
I'm looking for a comparator that sort columns in reverse order on for
example bytes?
I saw that you can write your own comparator class, but just thought that
someone must have done that already.

-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Jonathan Ellis
you can scan in reversed (reversed=True in slicerange) w/o needing a
custom comparator.

On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad erikhols...@gmail.com wrote:
 Hey!
 I'm looking for a comparator that sort columns in reverse order on for
 example bytes?
 I saw that you can write your own comparator class, but just thought that
 someone must have done that already.

 --
 Regards Erik



Re: Reverse sort order comparator?

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad erikhols...@gmail.com wrote:

 Hey!
 I'm looking for a comparator that sort columns in reverse order on for
 example bytes?
 I saw that you can write your own comparator class, but just thought that
 someone must have done that already.


When you get_slice, just set reverse to true in the SliceRange and it will
reverse the order.

-Brandon


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Thanks guys!
So I want to use sliceRange but thinking about using the count parameter.
For example give me
the first x columns, next call I would like to call it with a start value
and a count.

If I was to use the reverse param in sliceRange I would have to fetch all
the columns first, right?


On Tue, Feb 2, 2010 at 9:23 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad erikhols...@gmail.comwrote:

 Hey!
 I'm looking for a comparator that sort columns in reverse order on for
 example bytes?
 I saw that you can write your own comparator class, but just thought that
 someone must have done that already.


 When you get_slice, just set reverse to true in the SliceRange and it will
 reverse the order.

 -Brandon




-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad erikhols...@gmail.com wrote:

 Thanks guys!
 So I want to use sliceRange but thinking about using the count parameter.
 For example give me
 the first x columns, next call I would like to call it with a start value
 and a count.

 If I was to use the reverse param in sliceRange I would have to fetch all
 the columns first, right?


If you pass reverse as true, then instead of getting the first x columns,
you'll get the last x columns.  If you want to head backwards toward the
beginning, you can pass the first column as the end value.

-Brandon


Key/row names?

2010-02-02 Thread Erik Holstad
Is there a way to use a byte[] as the key instead of a string?
If not what is the main reason for using strings for the key but
the columns and the values can be byte[]? Is it just to be able
to use it as the key in a Map etc or are there other reasons?

-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:35 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad erikhols...@gmail.comwrote:

 Thanks guys!
 So I want to use sliceRange but thinking about using the count parameter.
 For example give me
 the first x columns, next call I would like to call it with a start value
 and a count.

 If I was to use the reverse param in sliceRange I would have to fetch all
 the columns first, right?


 If you pass reverse as true, then instead of getting the first x columns,
 you'll get the last x columns.  If you want to head backwards toward the
 beginning, you can pass the first column as the end value.

 -Brandon

Wow that sounds really good. So you are saying if I set it to reverse sort
order and count 10 for the first round I get the last 10,
for the next call I just set the last column from the first call to start
and I will get the columns -10- -20, so to speak?


-- 
Regards Erik


Re: Key/row names?

2010-02-02 Thread Jonathan Ellis
On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad erikhols...@gmail.com wrote:
 Is there a way to use a byte[] as the key instead of a string?

no.

 If not what is the main reason for using strings for the key but
 the columns and the values can be byte[]?

historical baggage.  we might switch to byte[] keys in 0.7.

-Jonathan


Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Sorry that there are a lot of questions from me this week,  just trying to
better understand
the best way to use Cassandra :)

Let us say that you know the length of your key, everything is standardized,
are there people
out there that just tag the value onto the key so that you don't have to pay
the extra overhead
of the second byte[]?

-- 
Regards Erik


Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:57 AM, Brandon Williams dri...@gmail.com wrote:

 On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad erikhols...@gmail.comwrote:


 Wow that sounds really good. So you are saying if I set it to reverse sort
 order and count 10 for the first round I get the last 10,
 for the next call I just set the last column from the first call to start
 and I will get the columns -10- -20, so to speak?


 Actually, since they are reversed and you're trying to move backwards,
 you'll need to pass the last column from the first query (since they will be
 sorted in reverse order) as the start to the next one with reverse still set
 to true.

 -Brandon


Thanks a lot Brandon for clearing that out for me, I think that was what I
was trying to say. But that is really good,
cause now I don't have to store the data twice in different sort orders.



-- 
Regards Erik


Re: Key/row names?

2010-02-02 Thread Erik Holstad
Thank you!

On Tue, Feb 2, 2010 at 9:41 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad erikhols...@gmail.com
 wrote:
  Is there a way to use a byte[] as the key instead of a string?

 no.

  If not what is the main reason for using strings for the key but
  the columns and the values can be byte[]?

 historical baggage.  we might switch to byte[] keys in 0.7.

 -Jonathan




-- 
Regards Erik


Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Jean-Denis Greze
Ok, so 0.6's https://issues.apache.org/jira/browse/CASSANDRA-745 permits
someone using RandomPartitioner to pass start= and finish= to get all
of the rows in their cluster, although in an extremely inefficient way.

We are in a situation like Pierre's, where we need to know what's currently
in the DB so to speak -- except that we have a hundreds of millions of rows
(and increasing) and that maintaining an index of the keys in another CF, as
Brandon suggests, is becoming difficult (we also don't like the double write
on initial key inserts, in terms of transactionality especially).

Also, every once in a while, we need to enhance our data as part of some
functionality upgrade or refactoring.  So far, what we do is enhance on
reads (i.e., whenever we read a particular record, see if it's not up to the
latest version, and if so enhance), but there are many problems with this
approach. We've been considering doing background process enhancing by
running through all of the keys, which is why 745 is pretty exciting.  We'd
rather go through the inefficient operation once in a while as opposed to
doing a check on every read.

Anyway, partially to address the efficiency concern, I've been playing
around with the idea of having 745-like functionality on a per-node basis: a
call to get all of the keys on a particular node as opposed to the entire
cluster.  It just seems like with a very large cluster with billions, tens
of billions, or hundreds of billions of keys 745 would just get overwhelmed.
 Just a thought.







On Tue, Feb 2, 2010 at 7:31 AM, Jonathan Ellis jbel...@gmail.com wrote:

 More or less (but see
 https://issues.apache.org/jira/browse/CASSANDRA-745, in 0.6).

 Think of it this way: when you have a few billion keys, how useful is
 it to list them?

 -Jonathan

 2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com:
  Hi all,
  I would like to know how to retrieve the list of available keys
available
  for a specific column. There is the get_key_range method, but it is only
  available when using the OrderPreservingPartitioner -- I use a
  RandomPartitioner.
  Does this mean that when using a RandomPartitioner, you cannot see which
  keys are available in the database ?
   -- Sébastien



--
jeande...@6coders.com
(917) 951-0636

This email and any files transmitted with it are confidential and intended
solely for the use of the individual to whom they are addressed. If you have
received this email in error please notify the system manager. This message
contains confidential information and is intended only for the individual
named. If you are not the named addressee you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately by
e-mail if you have received this e-mail by mistake and delete this e-mail
from your system. If you are not the intended recipient you are notified
that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.


Re: get_slice() slow if more number of columns present in a SCF.

2010-02-02 Thread Nathan McCall
Thank you for the benchmarks. What version of Cassandra are you using?
I had about 80% performance improvement on single node reads after
using a trunk build with the results from
https://issues.apache.org/jira/browse/CASSANDRA-688 (result caching)
and playing around with the configuration. I am not yet running this
in production though, so I cannot provide any real numbers.

That said, I have no intention of deploying a single node. I keep
seing performance concerns from folks on small or single node
clusters. My impression so far is that Cassandra may not be the right
solution for these types of deployments.

My main interest in Cassandra is the linear scalability of reads and
writes. From my own tests and some of the discussion on these lists,
it seems Cassandra can thrash around a lot when the number of nodes =
the replication factor * 2, particularly if a node goes down. I
understand this is a design trade-off of sorts and I am fine with it.
Any sort of distributed, fault tolerant system is well served by using
lots of commodity hardware.

What I found to have been most valuable for my evaluation was to get a
good test together with some real data from our system and then add
nodes, remove nodes, break nodes, etc. and watch what happens. Once I
finish with this, it looks like I will have some solid numbers to do
some capacity planning for figuring out exactly how much hardware to
purchase and when I will need to add more.

Apologies to the original poster if that got a little long winded, but
hopefully it will be useful information for folks.

Cheers,
-Nate


On Tue, Feb 2, 2010 at 7:27 AM, envio user enviou...@gmail.com wrote:
 All,

 Here are some tests[batch_insert() and get_slice()] I performed on cassandra.

 H/W: Single node, Quad Core(8 cores), 8GB RAM:
 Two separate physical disks, one for the commit log and another for the data.

 storage-conf.xml
 
 KeysCachedFraction0.4/KeysCachedFraction
 CommitLogRotationThresholdInMB256/CommitLogRotationThresholdInMB
 MemtableSizeInMB128/MemtableSizeInMB
 MemtableObjectCountInMillions0.2/MemtableObjectCountInMillions
 MemtableFlushAfterMinutes1440/MemtableFlushAfterMinutes
 ConcurrentReads16/ConcurrentReads


 Data Model:

 ColumnFamily ColumnType=Super CompareWith=UTF8Type
 CompareSubcolumnsWith=UTF8Type Name=Super1 /

 TEST1A
 ==
 /home/sunpython stress.py -n 10 -t 100 -y super -u 1 -c 25 -r -o
 insert -i 10
 WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
 total,interval_op_rate,avg_latency,elapsed_time
 19039,1903,0.0532085509215,10
 52052,3301,0.0302550313445,20
 82274,3022,0.0330235137811,30
 10,1772,0.0337765234716,40

 TEST1B
 =
 /home/sunpython stress.py -n 10 -t 100 -y super -u 1 -c 25 -r -o read -i 
 10
 WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
 total,interval_op_rate,avg_latency,elapsed_time
 16472,1647,0.0615632034523,10
 39375,2290,0.04384300123,20
 65259,2588,0.0385473697268,30
 91613,2635,0.0379411213277,40
 10,838,0.0331208069702,50
 /home/sun


  I deleted all the data(all: commitlog,data..) and restarted cassandra.***
 I am ok with TEST1A and TEST1B. I want to populate the SCF with  500
 columns and read 25 columns per key.

 TEST2A
 ==
 /home/sunpython stress.py -n 10 -t 100 -y super -u 1 -c 600 -r -o
 insert -i 10
 WARNING: multiprocessing not present, threading will be used.
        Benchmark may not be accurate!
 total,interval_op_rate,avg_latency,elapsed_time
 .
 .
 84216,144,0.689481827031,570
 85768,155,0.625061393859,580
 87307,153,0.648041650953,590
 88785,147,0.671928719674,600
 90488,170,0.611753724284,610
 91983,149,0.677673689896,620
 93490,150,0.63891824366,630
 95017,152,0.65472143182,640
 96612,159,0.64355712789,650
 98098,148,0.673311280851,660
 99622,152,0.486848112166,670
 10,37,0.174115514629,680

 I understand nobody will write 600 columns at a time. I just need to
 populate the data, hence I did this test.

 [r...@fc10mc1 ~]# ls -l /var/lib/cassandra/commitlog/
 total 373880
 -rw-r--r-- 1 root root 268462742 2010-02-03 02:00 CommitLog-1265141714717.log
 -rw-r--r-- 1 root root 114003919 2010-02-03 02:00 CommitLog-1265142593543.log

 [r...@fc10mc1 ~]# ls -l /cassandra/lib/cassandra/data/Keyspace1/
 total 3024232
 -rw-r--r-- 1 root root 1508524822 2010-02-03 02:00 Super1-192-Data.db
 -rw-r--r-- 1 root root      92725 2010-02-03 02:00 Super1-192-Filter.db
 -rw-r--r-- 1 root root    2639957 2010-02-03 02:00 Super1-192-Index.db
 -rw-r--r-- 1 root root  100838971 2010-02-03 02:02 Super1-279-Data.db
 -rw-r--r-- 1 root root       8725 2010-02-03 02:02 Super1-279-Filter.db
 -rw-r--r-- 1 root root     176481 2010-02-03 02:02 Super1-279-Index.db
 -rw-r--r-- 1 root root 1478775337 2010-02-03 02:03 Super1-280-Data.db
 -rw-r--r-- 1 root root      90805 2010-02-03 02:03 Super1-280-Filter.db
 -rw-r--r-- 1 root root    2588072 2010-02-03 

Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Jonathan Ellis
On Tue, Feb 2, 2010 at 12:51 PM, Jean-Denis Greze jeande...@6coders.com wrote:
 Anyway, partially to address the efficiency concern, I've been playing
 around with the idea of having 745-like functionality on a per-node basis: a
 call to get all of the keys on a particular node as opposed to the entire
 cluster.  It just seems like with a very large cluster with billions, tens
 of billions, or hundreds of billions of keys 745 would just get overwhelmed.

That's why 745 is really there for hadoop support
(https://issues.apache.org/jira/browse/CASSANDRA-342), not something
intended to be used manually.

-Jonathan


Re: get_slice() slow if more number of columns present in a SCF.

2010-02-02 Thread Brandon Williams
On Tue, Feb 2, 2010 at 9:27 AM, envio user enviou...@gmail.com wrote:

 All,

 Here are some tests[batch_insert() and get_slice()] I performed on
 cassandra.

snip


 I am ok with TEST1A and TEST1B. I want to populate the SCF with  500
 columns and read 25 columns per key.

 snip


 This test is more worrying for us. We can't even read 1000 reads per
 second. Is there any limitation on cassandra, which will not work with
 more number of columns ?.  Or mm I doing something wrong here?. Please
 let me know.


I think you're mostly being limited by
http://issues.apache.org/jira/browse/CASSANDRA-598
Can you try with a simple CF?

-Brandon


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
If I understand you correctly, I think I have a decent example. I have
a ColumnFamily which models user preferences for a site in our
system:

UserPreferences : {
  123_EDD43E57589F12032AF73E23A6AF3F47 : {
favorite_color : red,
...
  }
}

I structured it this way because we have a lot of sites to which a
user could create preferences for, so the site_id is prepended to the
value of a session_id therefore you always need two pieces of
information to enforce that a given record belongs to a specific
site. The site_id is always an integer and the session_id is always
a 32 char string so sticking an underscore between them makes
validatable parsing and construction trivial. The bloom filtering
behind the key lookups also make checks for existence extremely fast.

Note: I feel compelled to mention this is not a typical use case that
I think would generally warrant anything outside of an RDBMS. However,
In our system writes to this preference table can burst up to
several thousand a second. Thus the need for the predictable write
performance of Cassandra.

Cheers,
Nate



On Tue, Feb 2, 2010 at 9:50 AM, Erik Holstad erikhols...@gmail.com wrote:
 Sorry that there are a lot of questions from me this week,  just trying to
 better understand
 the best way to use Cassandra :)

 Let us say that you know the length of your key, everything is standardized,
 are there people
 out there that just tag the value onto the key so that you don't have to pay
 the extra overhead
 of the second byte[]?

 --
 Regards Erik



Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Thanks Nate for the example.

I was thinking more a long the lines of something like:

If you have a family

Data : {
  row1 : {
col1:val1,
  row2 : {
col1:val2,
...
  }
}


Using
Sorts : {
  sort_row : {
sortKey1_datarow1: [],
sortKey2_datarow2: []
  }
}

Instead of
Sorts : {
  sort_row : {
sortKey1: datarow1,
sortKey2: datarow2
  }
}

If that makes any sense?

-- 
Regards Erik


order-preserving partitioner per CF?

2010-02-02 Thread Wojciech Kaczmarek
Hi,

I'm evaluating Cassandra since few days and I'd say it has really high
coolness factor! :)

My biggest question so far is about order-preserving partitioner. I'd
like to have such partitioner for a specific column family, having
random partitioner for others. Is it possible wrt to the current
architecture? If not, is it planned?

What I see now is that partitioner is defined in the scope of Storage
tag in storage-conf.xml, not even inside a keyspace definition. It
makes me assume that partitioner setting is per the whole cassandra
cluster.

cheers,

Wojtek


Re: order-preserving partitioner per CF?

2010-02-02 Thread Jonathan Ellis
On Tue, Feb 2, 2010 at 2:53 PM, Wojciech Kaczmarek
kaczmare...@gmail.com wrote:
 Hi,

 I'm evaluating Cassandra since few days and I'd say it has really high
 coolness factor! :)

 My biggest question so far is about order-preserving partitioner. I'd
 like to have such partitioner for a specific column family, having
 random partitioner for others. Is it possible wrt to the current
 architecture?

No.

 If not, is it planned?

As attractive as it is on the wish list, I don't see how you could
sanely do it with the current architecture.

-Jonathan


Re: easy interface to Cassandra

2010-02-02 Thread Ted Zlatanov
On Tue, 19 Jan 2010 08:09:13 -0600 Ted Zlatanov t...@lifelogs.com wrote: 

TZ My proposal is as follows:

TZ - provide an IPluggableAPI interface; classes that implement it are
TZ   essentially standalone Cassandra servers.  Maybe this can just
TZ   parallel Thread and implement Runnable.

TZ - enable the users to specify which IPluggableAPI they want and provide
TZ   instantiation options (port, connection limit, etc.)

TZ - write a simple HTTPPluggableAPI, which provides a web server and
TZ   accepts POST requests.  The exact path and option spec can be worked
TZ   out later.  The input and output formats can be specified with a query
TZ   parameter; at least JSON and XML should be supported.

First very rough proposal is at
https://issues.apache.org/jira/browse/CASSANDRA-754

Ted



Re: order-preserving partitioner per CF?

2010-02-02 Thread Wojciech Kaczmarek
On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis jbel...@gmail.com wrote:

 My biggest question so far is about order-preserving partitioner. I'd
 like to have such partitioner for a specific column family, having
 random partitioner for others. Is it possible wrt to the current
 architecture?

 No.

Ok. Upon reading more details on a wiki I see it doesn't fit now.

Now I'm thinking about scenarios of distributing the keys using OPP
without knowing the number of nodes a priori.

Does this explanation:
http://wiki.apache.org/cassandra/Operations#Range_changes

applies to any partitioner?


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
Erik,
Sure, you could and depending on the workload, that might be quite
efficient for small pieces of data. However, this also sounds like
something that might be better addressed with the addition of a
SuperColumn on Sorts and getting rid of Data altogether:

Sorts : {
   sort_row_1 : {
sortKey1 : { col1:val1, col2:val2 },
sortKey2 : { col1:val3, col2:val4 }
   }
}

You can have an infinite number of SuperColumns for a key, but make
sure you understand get_slice vs. get_range_slice before you commit to
a design. Hopefully I understood your example correctly, if not, do
you have anything more concrete?

Cheers,
-Nate


On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad erikhols...@gmail.com wrote:
 Thanks Nate for the example.

 I was thinking more a long the lines of something like:

 If you have a family

 Data : {
   row1 : {
     col1:val1,
   row2 : {
     col1:val2,
     ...
   }
 }


 Using
 Sorts : {
   sort_row : {
     sortKey1_datarow1: [],
     sortKey2_datarow2: []
   }
 }

 Instead of
 Sorts : {
   sort_row : {
     sortKey1: datarow1,
     sortKey2: datarow2
   }
 }

 If that makes any sense?

 --
 Regards Erik



Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
@Nathan
So what I'm planning to do is to store multiple sort orders for the same
data, where they all use the
same data table just fetches it in different orders, so to say. I want to be
able to rad the different sort
orders from the front and from the back to get both regular and reverse sort
order.

With your approach using super columns you would need to replicate all data,
right?

And if I understand
http://issues.apache.org/jira/browse/CASSANDRA-598correctly you would
need to
read the whole thing before you can limit the results handed back to you.

In regards to the two calls get_slice and get_range_slice, the way I
understand it is that you hand
the second one an optional start and stop key plus a limit, to get a range
of keys/rows. I was planning
to use this call together with the OPP, but are thinking about not using it
since there is no way to do
an inverse scan, right?

Thanks a lot
Erik


On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
jesse.mcconn...@gmail.comwrote:

 infinite is a bit of a bold claim

 by my understanding you are bound by the memory of the jvm as all of
 the content of a key/row currently needs to fit in memory for
 compaction, which includes columns and supercolumns for given key/row.

 if you are going to run into those scenarios then some sort of
 sharding on the keys is required, afaict

 cheers,
 jesse

 --
 jesse mcconnell
 jesse.mcconn...@gmail.com



 On Tue, Feb 2, 2010 at 16:30, Nathan McCall n...@vervewireless.com
 wrote:
  Erik,
  Sure, you could and depending on the workload, that might be quite
  efficient for small pieces of data. However, this also sounds like
  something that might be better addressed with the addition of a
  SuperColumn on Sorts and getting rid of Data altogether:
 
  Sorts : {
sort_row_1 : {
 sortKey1 : { col1:val1, col2:val2 },
 sortKey2 : { col1:val3, col2:val4 }
}
  }
 
  You can have an infinite number of SuperColumns for a key, but make
  sure you understand get_slice vs. get_range_slice before you commit to
  a design. Hopefully I understood your example correctly, if not, do
  you have anything more concrete?
 
  Cheers,
  -Nate
 
 
  On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad erikhols...@gmail.com
 wrote:
  Thanks Nate for the example.
 
  I was thinking more a long the lines of something like:
 
  If you have a family
 
  Data : {
row1 : {
  col1:val1,
row2 : {
  col1:val2,
  ...
}
  }
 
 
  Using
  Sorts : {
sort_row : {
  sortKey1_datarow1: [],
  sortKey2_datarow2: []
}
  }
 
  Instead of
  Sorts : {
sort_row : {
  sortKey1: datarow1,
  sortKey2: datarow2
}
  }
 
  If that makes any sense?
 
  --
  Regards Erik
 
 




-- 
Regards Erik


Re: order-preserving partitioner per CF?

2010-02-02 Thread Wojciech Kaczmarek
Yeah excellent.

I checked that it's doable to convert the data to another Partitioner
using json backup tools - cool. I will probably write own partitioner
so it's good I won't loose my test data (though I assume I need to
pack all my data back to one node, export to json, delete sstables,
change partitioner, import sstables, then rerun node and subsequently
distribute to others).

On Tue, Feb 2, 2010 at 22:52, Jonathan Ellis jbel...@gmail.com wrote:
 yes

 On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek
 kaczmare...@gmail.com wrote:
 On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis jbel...@gmail.com wrote:

 My biggest question so far is about order-preserving partitioner. I'd
 like to have such partitioner for a specific column family, having
 random partitioner for others. Is it possible wrt to the current
 architecture?

 No.

 Ok. Upon reading more details on a wiki I see it doesn't fit now.

 Now I'm thinking about scenarios of distributing the keys using OPP
 without knowing the number of nodes a priori.

 Does this explanation:
 http://wiki.apache.org/cassandra/Operations#Range_changes

 applies to any partitioner?




Re: order-preserving partitioner per CF?

2010-02-02 Thread Jonathan Ellis
just remember that you can't mix nodes w/ different partitioner types
in the same cluster.

On Tue, Feb 2, 2010 at 5:04 PM, Wojciech Kaczmarek
kaczmare...@gmail.com wrote:
 Yeah excellent.

 I checked that it's doable to convert the data to another Partitioner
 using json backup tools - cool. I will probably write own partitioner
 so it's good I won't loose my test data (though I assume I need to
 pack all my data back to one node, export to json, delete sstables,
 change partitioner, import sstables, then rerun node and subsequently
 distribute to others).

 On Tue, Feb 2, 2010 at 22:52, Jonathan Ellis jbel...@gmail.com wrote:
 yes

 On Tue, Feb 2, 2010 at 3:50 PM, Wojciech Kaczmarek
 kaczmare...@gmail.com wrote:
 On Tue, Feb 2, 2010 at 21:57, Jonathan Ellis jbel...@gmail.com wrote:

 My biggest question so far is about order-preserving partitioner. I'd
 like to have such partitioner for a specific column family, having
 random partitioner for others. Is it possible wrt to the current
 architecture?

 No.

 Ok. Upon reading more details on a wiki I see it doesn't fit now.

 Now I'm thinking about scenarios of distributing the keys using OPP
 without knowing the number of nodes a priori.

 Does this explanation:
 http://wiki.apache.org/cassandra/Operations#Range_changes

 applies to any partitioner?





Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
Erik,
You can do an inverse with 'reversed=true' in SliceRange as part of
the SlicePredicate for both get_slice or get_range_slice. I have not
tried reverse=true on SuperColumn results, but I dont think there is
any difference there - what can't be changed is how things are ordered
but direction can go either way (If I am wrong on this, somebody
please correct me).

http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
radar as I dont have anything reporting-ish like you describe with
SuperColumns (yet). I will defer to more experienced folks with this.

Regards,
-Nate


On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad erikhols...@gmail.com wrote:
 @Nathan
 So what I'm planning to do is to store multiple sort orders for the same
 data, where they all use the
 same data table just fetches it in different orders, so to say. I want to be
 able to rad the different sort
 orders from the front and from the back to get both regular and reverse sort
 order.

 With your approach using super columns you would need to replicate all data,
 right?

 And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
 correctly you would need to
 read the whole thing before you can limit the results handed back to you.

 In regards to the two calls get_slice and get_range_slice, the way I
 understand it is that you hand
 the second one an optional start and stop key plus a limit, to get a range
 of keys/rows. I was planning
 to use this call together with the OPP, but are thinking about not using it
 since there is no way to do
 an inverse scan, right?

 Thanks a lot
 Erik


 On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell jesse.mcconn...@gmail.com
 wrote:

 infinite is a bit of a bold claim

 by my understanding you are bound by the memory of the jvm as all of
 the content of a key/row currently needs to fit in memory for
 compaction, which includes columns and supercolumns for given key/row.

 if you are going to run into those scenarios then some sort of
 sharding on the keys is required, afaict

 cheers,
 jesse

 --
 jesse mcconnell
 jesse.mcconn...@gmail.com



 On Tue, Feb 2, 2010 at 16:30, Nathan McCall n...@vervewireless.com
 wrote:
  Erik,
  Sure, you could and depending on the workload, that might be quite
  efficient for small pieces of data. However, this also sounds like
  something that might be better addressed with the addition of a
  SuperColumn on Sorts and getting rid of Data altogether:
 
  Sorts : {
    sort_row_1 : {
         sortKey1 : { col1:val1, col2:val2 },
         sortKey2 : { col1:val3, col2:val4 }
    }
  }
 
  You can have an infinite number of SuperColumns for a key, but make
  sure you understand get_slice vs. get_range_slice before you commit to
  a design. Hopefully I understood your example correctly, if not, do
  you have anything more concrete?
 
  Cheers,
  -Nate
 
 
  On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad erikhols...@gmail.com
  wrote:
  Thanks Nate for the example.
 
  I was thinking more a long the lines of something like:
 
  If you have a family
 
  Data : {
    row1 : {
      col1:val1,
    row2 : {
      col1:val2,
      ...
    }
  }
 
 
  Using
  Sorts : {
    sort_row : {
      sortKey1_datarow1: [],
      sortKey2_datarow2: []
    }
  }
 
  Instead of
  Sorts : {
    sort_row : {
      sortKey1: datarow1,
      sortKey2: datarow2
    }
  }
 
  If that makes any sense?
 
  --
  Regards Erik
 
 



 --
 Regards Erik



Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Hey Nate!
What I wanted to do with the get_range_slice was to receive the keys in the
inverted order, so that I could so offset limit queries on key ranges in
reverse
order. Like you said, this can be done for both columns and super columns
with
help of the SliceRange, but not on keys afaik, but maybe there is a way?

Thanks Erik


On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall n...@vervewireless.comwrote:

 Erik,
 You can do an inverse with 'reversed=true' in SliceRange as part of
 the SlicePredicate for both get_slice or get_range_slice. I have not
 tried reverse=true on SuperColumn results, but I dont think there is
 any difference there - what can't be changed is how things are ordered
 but direction can go either way (If I am wrong on this, somebody
 please correct me).

 http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
 radar as I dont have anything reporting-ish like you describe with
 SuperColumns (yet). I will defer to more experienced folks with this.

 Regards,
 -Nate


 On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad erikhols...@gmail.com
 wrote:
  @Nathan
  So what I'm planning to do is to store multiple sort orders for the same
  data, where they all use the
  same data table just fetches it in different orders, so to say. I want to
 be
  able to rad the different sort
  orders from the front and from the back to get both regular and reverse
 sort
  order.
 
  With your approach using super columns you would need to replicate all
 data,
  right?
 
  And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
  correctly you would need to
  read the whole thing before you can limit the results handed back to you.
 
  In regards to the two calls get_slice and get_range_slice, the way I
  understand it is that you hand
  the second one an optional start and stop key plus a limit, to get a
 range
  of keys/rows. I was planning
  to use this call together with the OPP, but are thinking about not using
 it
  since there is no way to do
  an inverse scan, right?
 
  Thanks a lot
  Erik
 
 
  On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell 
 jesse.mcconn...@gmail.com
  wrote:
 
  infinite is a bit of a bold claim
 
  by my understanding you are bound by the memory of the jvm as all of
  the content of a key/row currently needs to fit in memory for
  compaction, which includes columns and supercolumns for given key/row.
 
  if you are going to run into those scenarios then some sort of
  sharding on the keys is required, afaict
 
  cheers,
  jesse
 
  --
  jesse mcconnell
  jesse.mcconn...@gmail.com
 
 
 
  On Tue, Feb 2, 2010 at 16:30, Nathan McCall n...@vervewireless.com
  wrote:
   Erik,
   Sure, you could and depending on the workload, that might be quite
   efficient for small pieces of data. However, this also sounds like
   something that might be better addressed with the addition of a
   SuperColumn on Sorts and getting rid of Data altogether:
  
   Sorts : {
 sort_row_1 : {
  sortKey1 : { col1:val1, col2:val2 },
  sortKey2 : { col1:val3, col2:val4 }
 }
   }
  
   You can have an infinite number of SuperColumns for a key, but make
   sure you understand get_slice vs. get_range_slice before you commit to
   a design. Hopefully I understood your example correctly, if not, do
   you have anything more concrete?
  
   Cheers,
   -Nate
  
  
   On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad erikhols...@gmail.com
   wrote:
   Thanks Nate for the example.
  
   I was thinking more a long the lines of something like:
  
   If you have a family
  
   Data : {
 row1 : {
   col1:val1,
 row2 : {
   col1:val2,
   ...
 }
   }
  
  
   Using
   Sorts : {
 sort_row : {
   sortKey1_datarow1: [],
   sortKey2_datarow2: []
 }
   }
  
   Instead of
   Sorts : {
 sort_row : {
   sortKey1: datarow1,
   sortKey2: datarow2
 }
   }
  
   If that makes any sense?
  
   --
   Regards Erik
  
  
 
 
 
  --
  Regards Erik
 




-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Jonathan Ellis
Right, we don't currently support scanning rows in reverse order, but
that is only because nobody has wanted it badly enough to code it. :)

On Tue, Feb 2, 2010 at 6:06 PM, Erik Holstad erikhols...@gmail.com wrote:
 Hey Nate!
 What I wanted to do with the get_range_slice was to receive the keys in the
 inverted order, so that I could so offset limit queries on key ranges in
 reverse
 order. Like you said, this can be done for both columns and super columns
 with
 help of the SliceRange, but not on keys afaik, but maybe there is a way?

 Thanks Erik


 On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall n...@vervewireless.com
 wrote:

 Erik,
 You can do an inverse with 'reversed=true' in SliceRange as part of
 the SlicePredicate for both get_slice or get_range_slice. I have not
 tried reverse=true on SuperColumn results, but I dont think there is
 any difference there - what can't be changed is how things are ordered
 but direction can go either way (If I am wrong on this, somebody
 please correct me).

 http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
 radar as I dont have anything reporting-ish like you describe with
 SuperColumns (yet). I will defer to more experienced folks with this.

 Regards,
 -Nate


 On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad erikhols...@gmail.com
 wrote:
  @Nathan
  So what I'm planning to do is to store multiple sort orders for the same
  data, where they all use the
  same data table just fetches it in different orders, so to say. I want
  to be
  able to rad the different sort
  orders from the front and from the back to get both regular and reverse
  sort
  order.
 
  With your approach using super columns you would need to replicate all
  data,
  right?
 
  And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
  correctly you would need to
  read the whole thing before you can limit the results handed back to
  you.
 
  In regards to the two calls get_slice and get_range_slice, the way I
  understand it is that you hand
  the second one an optional start and stop key plus a limit, to get a
  range
  of keys/rows. I was planning
  to use this call together with the OPP, but are thinking about not using
  it
  since there is no way to do
  an inverse scan, right?
 
  Thanks a lot
  Erik
 
 
  On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
  jesse.mcconn...@gmail.com
  wrote:
 
  infinite is a bit of a bold claim
 
  by my understanding you are bound by the memory of the jvm as all of
  the content of a key/row currently needs to fit in memory for
  compaction, which includes columns and supercolumns for given key/row.
 
  if you are going to run into those scenarios then some sort of
  sharding on the keys is required, afaict
 
  cheers,
  jesse
 
  --
  jesse mcconnell
  jesse.mcconn...@gmail.com
 
 
 
  On Tue, Feb 2, 2010 at 16:30, Nathan McCall n...@vervewireless.com
  wrote:
   Erik,
   Sure, you could and depending on the workload, that might be quite
   efficient for small pieces of data. However, this also sounds like
   something that might be better addressed with the addition of a
   SuperColumn on Sorts and getting rid of Data altogether:
  
   Sorts : {
     sort_row_1 : {
          sortKey1 : { col1:val1, col2:val2 },
          sortKey2 : { col1:val3, col2:val4 }
     }
   }
  
   You can have an infinite number of SuperColumns for a key, but make
   sure you understand get_slice vs. get_range_slice before you commit
   to
   a design. Hopefully I understood your example correctly, if not, do
   you have anything more concrete?
  
   Cheers,
   -Nate
  
  
   On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad erikhols...@gmail.com
   wrote:
   Thanks Nate for the example.
  
   I was thinking more a long the lines of something like:
  
   If you have a family
  
   Data : {
     row1 : {
       col1:val1,
     row2 : {
       col1:val2,
       ...
     }
   }
  
  
   Using
   Sorts : {
     sort_row : {
       sortKey1_datarow1: [],
       sortKey2_datarow2: []
     }
   }
  
   Instead of
   Sorts : {
     sort_row : {
       sortKey1: datarow1,
       sortKey2: datarow2
     }
   }
  
   If that makes any sense?
  
   --
   Regards Erik
  
  
 
 
 
  --
  Regards Erik
 



 --
 Regards Erik



Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
I don't understand what you mean ;)
Will see what happens when we are done with this first project, will see
if we can get some time to give back.

-- 
Regards Erik


Re: Using column plus value or only column?

2010-02-02 Thread Nathan McCall
Ok - I was afraid I was going to miss something with the generic
example before - my apologies on that. You cannot impose an order on
keys like that as far as I am aware. I think maintaining a Sort CF as
you had originally is a decent approach.

Cheers,
-Nate

On Tue, Feb 2, 2010 at 4:06 PM, Erik Holstad erikhols...@gmail.com wrote:
 Hey Nate!
 What I wanted to do with the get_range_slice was to receive the keys in the
 inverted order, so that I could so offset limit queries on key ranges in
 reverse
 order. Like you said, this can be done for both columns and super columns
 with
 help of the SliceRange, but not on keys afaik, but maybe there is a way?

 Thanks Erik


 On Tue, Feb 2, 2010 at 3:55 PM, Nathan McCall n...@vervewireless.com
 wrote:

 Erik,
 You can do an inverse with 'reversed=true' in SliceRange as part of
 the SlicePredicate for both get_slice or get_range_slice. I have not
 tried reverse=true on SuperColumn results, but I dont think there is
 any difference there - what can't be changed is how things are ordered
 but direction can go either way (If I am wrong on this, somebody
 please correct me).

 http://issues.apache.org/jira/browse/CASSANDRA-598 has not been on my
 radar as I dont have anything reporting-ish like you describe with
 SuperColumns (yet). I will defer to more experienced folks with this.

 Regards,
 -Nate


 On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad erikhols...@gmail.com
 wrote:
  @Nathan
  So what I'm planning to do is to store multiple sort orders for the same
  data, where they all use the
  same data table just fetches it in different orders, so to say. I want
  to be
  able to rad the different sort
  orders from the front and from the back to get both regular and reverse
  sort
  order.
 
  With your approach using super columns you would need to replicate all
  data,
  right?
 
  And if I understand http://issues.apache.org/jira/browse/CASSANDRA-598
  correctly you would need to
  read the whole thing before you can limit the results handed back to
  you.
 
  In regards to the two calls get_slice and get_range_slice, the way I
  understand it is that you hand
  the second one an optional start and stop key plus a limit, to get a
  range
  of keys/rows. I was planning
  to use this call together with the OPP, but are thinking about not using
  it
  since there is no way to do
  an inverse scan, right?
 
  Thanks a lot
  Erik
 
 
  On Tue, Feb 2, 2010 at 2:39 PM, Jesse McConnell
  jesse.mcconn...@gmail.com
  wrote:
 
  infinite is a bit of a bold claim
 
  by my understanding you are bound by the memory of the jvm as all of
  the content of a key/row currently needs to fit in memory for
  compaction, which includes columns and supercolumns for given key/row.
 
  if you are going to run into those scenarios then some sort of
  sharding on the keys is required, afaict
 
  cheers,
  jesse
 
  --
  jesse mcconnell
  jesse.mcconn...@gmail.com
 
 
 
  On Tue, Feb 2, 2010 at 16:30, Nathan McCall n...@vervewireless.com
  wrote:
   Erik,
   Sure, you could and depending on the workload, that might be quite
   efficient for small pieces of data. However, this also sounds like
   something that might be better addressed with the addition of a
   SuperColumn on Sorts and getting rid of Data altogether:
  
   Sorts : {
     sort_row_1 : {
          sortKey1 : { col1:val1, col2:val2 },
          sortKey2 : { col1:val3, col2:val4 }
     }
   }
  
   You can have an infinite number of SuperColumns for a key, but make
   sure you understand get_slice vs. get_range_slice before you commit
   to
   a design. Hopefully I understood your example correctly, if not, do
   you have anything more concrete?
  
   Cheers,
   -Nate
  
  
   On Tue, Feb 2, 2010 at 12:00 PM, Erik Holstad erikhols...@gmail.com
   wrote:
   Thanks Nate for the example.
  
   I was thinking more a long the lines of something like:
  
   If you have a family
  
   Data : {
     row1 : {
       col1:val1,
     row2 : {
       col1:val2,
       ...
     }
   }
  
  
   Using
   Sorts : {
     sort_row : {
       sortKey1_datarow1: [],
       sortKey2_datarow2: []
     }
   }
  
   Instead of
   Sorts : {
     sort_row : {
       sortKey1: datarow1,
       sortKey2: datarow2
     }
   }
  
   If that makes any sense?
  
   --
   Regards Erik
  
  
 
 
 
  --
  Regards Erik
 



 --
 Regards Erik



Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Don't be silly, thanks a lot for helping me out!

-- 
Regards Erik


Re: How do cassandra clients failover?

2010-02-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Feb 1, 2010 at 7:38 PM, Jonathan Ellis jbel...@gmail.com wrote:
 No.  Thrift is just an RPC mechanism.  Whether RRDNS, software or
 hardware load balancing, or client-based failover like Gary describes
 is best is not a one-size-fits-all answer.
Everyone who uses Cassandra would need to implement Loadbalancing and
failover. Some may do it right and some may do it wrong .Because this
solution is going to be cassandra specific ,  you may not find any
publicly available libraries to help you out.

Ideally, the client would be a a Thrift API wrapper, which
automatically does Loadbalancing and failover . This definitely may
not be the only solution. But this can be one which may not need any
external RRDNS.


 2010/2/1 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 is it worth adding this feature to the standard java client?

 On Mon, Feb 1, 2010 at 7:28 PM, Gary Dusbabek gdusba...@gmail.com wrote:
 One approach is to discover what other nodes there are before any of
 them fail.  Then when you detect failure, you can connect to a
 different node that is (hopefully) still responding.

 There is an API call that allows you get get a list of all the nodes:
 client.get_string_property(token map), which returns a JSON list of
 the node ring.

 I hope that helps.

 Gary.

 2010/2/1 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 The cassandra client (thift client) is started up with the host:post
 of a single cassandra node.

 * What happens if that node fails?
 * Does it mean that all the operations go through the same node?

 --Noble





 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com