Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread vd
Hi Rishi

The writes in Cassandra are not directly written to the Disk, they are
stored in memory and later on flushed to the disk. May be thats why you are
not getting much out of iostat. Cant say about high cpu usage.
___
Vineet Daniel
___

Let your email find you


On Fri, Jun 11, 2010 at 6:12 AM, Rishi Bhardwaj wrote:

> Hi
>
> I am investigating Cassandra write performance and see very heavy CPU usage
> from Cassandra. I have a single node Cassandra instance running on a dual
> core (2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are being
> generated from the same server using BatchMutate(). The client makes exactly
> one RPC call at a time to Cassandra. Each BatchMutate() RPC contains 2 MB of
> data and once it is acknowledged by Cassandra, the next RPC is done.
> Cassandra has two separate disks, one for commitlog with a sequential b/w of
> 130MBps and the other a solid state disk for data with b/w of 90MBps. Tuning
> various parameters, I observe that I am able to attain a maximum write
> performance of about 45 to 50 MBps from Cassandra. I see that the
> Cassandra java process consistently uses 100% to 150% of CPU resources (as
> shown by top) during the entire write operation. Also, iostat clearly shows
> that the max disk bandwidth is not reached anytime during the write
> operation, every now and then the i/o activity on "commitlog" disk and the
> data disk spike but it is never consistently maintained by cassandra close
> to their peak. I would imagine that the CPU is probably the bottleneck
> here. Does anyone have any idea why Cassandra beats the heck out of the CPU
> here? Any suggestions on how to go about finding the exact bottleneck here?
>
> Some more information about the writes: I have 2 column families, the data
> though is mostly written in one column family with column sizes of around
> 32k and each row having around 256 or 512 columns. I would really appreciate
> any help here.
>
> Thanks,
> Rishi
>
>
>


Re: searching keys of the form substring*

2010-06-01 Thread vd
As I told you on IRC channel dont go for shortcuts ...learn java
first.
___
Vineet Daniel
___

Let your email find you


On Tue, Jun 1, 2010 at 11:47 AM, Sagar Agrawal  wrote:

> Thanks Vineet for replying, but I am not able to understand how can we use
> variable substitution in it.
>
>
>
>
> On Mon, May 31, 2010 at 4:42 PM, vd  wrote:
>
>> Hi Sagar
>>
>> You can use variable substitution.
>> ___
>> Vineet Daniel
>> ___
>>
>> Let your email find you
>>
>>
>>
>> On Mon, May 31, 2010 at 3:44 PM, Sagar Agrawal  wrote:
>>
>>> Hi folks,
>>>
>>> I want to  fetch all those records from my column family such that the
>>> key starts with a specified string...
>>>
>>> e.g.  Suppose I have a CF keyed on full names(first name + last name) of
>>> persons...
>>> now I want to fetch all those records whose first name is 'John'
>>>
>>> Right now, I am using OPP and KeyRange in the following way:
>>>
>>>  KeyRange keyRange = new KeyRange();
>>> keyRange.setStart_key("John");
>>> keyRange.setEnd_key("Joho");
>>>
>>> but this is sort of hard coding can anyone suggest a better way to
>>> achieve this?
>>>
>>> I would be really grateful... thank you.
>>>
>>>
>>>
>>
>


cluster throwing errors when new or existing node joins

2010-05-31 Thread vd
Hi

I have a setup of 4 nodes, whenever I am restarting any of the nodes, even
after deleting the data directories and commit log I get the following error

ERROR 18:46:41,296 Fatal exception in thread
Thread[COMMIT-LOG-WRITER,5,main]
java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.NullPointerException
at
org.apache.cassandra.db.Table$TableMetadata.getColumnFamilyId(Table.java:131)
at org.apache.cassandra.db.Table.getColumnFamilyId(Table.java:364)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:103)
at
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:475)
at
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 1 more
ERROR 18:46:41,297 Error in ThreadPoolExecutor
java.lang.NullPointerException
at org.apache.cassandra.db.Table.apply(Table.java:407)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
ERROR 18:46:41,299 Fatal exception in thread
Thread[ROW-MUTATION-STAGE:5,5,main]
java.lang.NullPointerException
at org.apache.cassandra.db.Table.apply(Table.java:407)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
ERROR 18:46:51,309 Error in ThreadPoolExecutor
java.lang.NullPointerException
at org.apache.cassandra.db.Table.apply(Table.java:407)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
ERROR 18:46:51,310 Fatal exception in thread
Thread[ROW-MUTATION-STAGE:6,5,main]


Kindly suggest what can be the reason for this error.
___
VD
___

Let your email find you


Re: searching keys of the form substring*

2010-05-31 Thread vd
Hi Sagar

You can use variable substitution.
___
Vineet Daniel
___

Let your email find you


On Mon, May 31, 2010 at 3:44 PM, Sagar Agrawal  wrote:

> Hi folks,
>
> I want to  fetch all those records from my column family such that the key
> starts with a specified string...
>
> e.g.  Suppose I have a CF keyed on full names(first name + last name) of
> persons...
> now I want to fetch all those records whose first name is 'John'
>
> Right now, I am using OPP and KeyRange in the following way:
>
>  KeyRange keyRange = new KeyRange();
> keyRange.setStart_key("John");
> keyRange.setEnd_key("Joho");
>
> but this is sort of hard coding can anyone suggest a better way to
> achieve this?
>
> I would be really grateful... thank you.
>
>
>


Re: what is DCQUORUM

2010-05-12 Thread vd
Thanks Eben


On Wed, May 12, 2010 at 7:33 PM, Eben Hewitt  wrote:

> QUORUM is a high consistency level. It refers to the number of nodes that
> have to acknowledge read or write operations in order to be assured that
> Cassandra is in a consistent state. It uses  / 2 + 1.
>
> DCQUORUM means "Data Center Quorum", and balances consistency with
> performance. It puts multiple replicas in each Data Center so operations can
> prefer replicas in the same DC for lower latency.
>
> See https://issues.apache.org/jira/browse/CASSANDRA-492 for a little
> discussion.
>
> Also see
> http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/locator/DatacenterShardStrategy.java
>  and
>
> http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/service/DatacenterWriteResponseHandler.java
>
> Eben
>
>
>
> On Wed, May 12, 2010 at 6:15 AM, vd  wrote:
>
>> Hi
>>
>> I have read about QUORUM but lately came across DCQUORUM. What is it  and
>> whats the difference between the two ?
>>
>>
>
>
> --
> "In science there are no 'depths'; there is surface everywhere."
> --Rudolph Carnap
>


what is DCQUORUM

2010-05-12 Thread vd
Hi

I have read about QUORUM but lately came across DCQUORUM. What is it  and
whats the difference between the two ?


Re: Is SuperColumn necessary?

2010-05-11 Thread vd
Hi

Can we make range search on ID:ID format as this would be treated as
single ID by API or can it bifurcate on ':' . If now then how do can
we ignore usage of supercolumns where we need to associate 'n' number
of rows to a single ID.
Like
  CatID1-> articleID1
  CatID1-> articleID2
  CatID1-> articleID3
  CatID1-> articleID4
How can we map such scenarios with simple column families.

Rgds.

On Tue, May 11, 2010 at 2:11 PM, Torsten Curdt  wrote:
> Exactly.
>
> On Tue, May 11, 2010 at 10:20, David Boxenhorn  wrote:
>> Don't think of it as getting rid of supercolum. Think of it as adding
>> superdupercolums, supertriplecolums, etc. Or, in sparse array terminology:
>> array[dim1][dim2][dim3].[dimN] = value
>>
>> Or, as said above:
>>
>>   > Type="UTF8">
>>     > Type="UTF8">
>>       
>>         
>>           
>>         
>>       
>>     
>>   
>


updating column names

2010-05-11 Thread vd
Hi

I have a column named colom. Can we update column name "colom" to
"column" during runtime or via API ?


how to count the columns

2010-05-11 Thread vd
Hi

Can we count the total no. of columns in a ColumnFamily, if yes how ?


Re: How to write WHERE .. LIKE query ?

2010-05-10 Thread vd
Hi Mike

AFAIK cassandra queries only on keys and not on column names, please verify.



On Tue, May 11, 2010 at 11:06 AM, Mike Malone  wrote:
>
>
> On Mon, May 10, 2010 at 9:00 PM, Shuge Lee  wrote:
>>
>> Hi all:
>> How to write WHERE ... LIKE query ?
>> For examples(described in Python):
>> Schema:
>> # columnfamily name
>> resources = [
>>    # key
>>     'foo': {
>>         # columns and value
>>         'url': 'foo.com',
>>         'pushlier': 'foo',
>>     },
>>     'oof': {
>>         'url': 'oof.com',
>>         'pushlier': 'off',
>>     },
>>    #  ... ,
>> }
>> # this is very easy,
>> SELECT * FROM KEY = 'foo'
>> but following are really hard:
>> SELECT * FROM resources WHERE key LIKE 'o%' # get all records which key
>> name contains character 'o'?
>
> get_range_slices(, ColumnParent(column_family),
> SlicePredicate(slice_range=SliceRange('',''), KeyRange('o', 'o~'),
> ConsistencyLevel.ONE);
>
>>
>> SELECT * FROM resources WHERE url == 'oof.com'
>
> This is a projection. Cassandra doesn't support this sort of query out of
> the box. You'll have to structure your data so that data you want to query
> by is in the key or column name. Or you'll have to manually build secondary
> indexes.
>
> Mike
>


Re: Tuning Cassandra

2010-05-10 Thread vd
What is the complete code string you are using to connect with cassandra
from Java code



On Mon, May 10, 2010 at 1:49 PM, David Boxenhorn  wrote:

> I don't know what "TSocket or the buffered one" means. Maybe I should know?
>
> I'm using Hector. Does that explain anything?
>
> On Mon, May 10, 2010 at 11:15 AM, vd  wrote:
>
>>
>> Hi
>>
>> what is it that you are using to connect with cassnadra TSocket or the
>> buffered one ?
>>
>>
>> 
>>
>> ___
>>
>>
>>
>>
>> On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn wrote:
>>
>>> I'm running Java on the client, jdbc queries on Oracle, Hector on
>>> Cassandra.
>>>
>>> The Cassandra and Oracle database designs are radically different, as you
>>> might guess.
>>>
>>> I have no doubt that Cassandra can be tuned, in a multiple-server
>>> cluster, to have superior throughput (that's why I'm doing it!). But for
>>> now, it's really frustrating my development effort that Cassandra is so
>>> slow. Can't I get it up to twice as slow as Oracle in my configuration?
>>>
>>> On Mon, May 10, 2010 at 10:47 AM, vd  wrote:
>>>
>>>> Hi David
>>>>
>>>> If I may ask...how do you plan to import data from oracle to cassandra ?
>>>> As answer AFAIK cassandra's true ability comes into play when running on
>>>> more than one machine...and please share how you are making comparisons 
>>>> like
>>>> on writes or reads from cassandra.
>>>>
>>>>
>>>>
>>>> ___
>>>> ___
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn wrote:
>>>>
>>>>> I'm running Oracle and Cassandra on my machine, trying to import my
>>>>> data to Cassandra from Oracle.
>>>>>
>>>>> In my configuration Oracle is about ten times faster than Cassandra.
>>>>> Cassandra has out-of-the-box tuning.
>>>>>
>>>>> I am new to Cassandra. How do I begin trying to tune it?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Tuning Cassandra

2010-05-10 Thread vd
Hi

what is it that you are using to connect with cassnadra TSocket or the
buffered one ?




___



On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn  wrote:

> I'm running Java on the client, jdbc queries on Oracle, Hector on
> Cassandra.
>
> The Cassandra and Oracle database designs are radically different, as you
> might guess.
>
> I have no doubt that Cassandra can be tuned, in a multiple-server cluster,
> to have superior throughput (that's why I'm doing it!). But for now, it's
> really frustrating my development effort that Cassandra is so slow. Can't I
> get it up to twice as slow as Oracle in my configuration?
>
> On Mon, May 10, 2010 at 10:47 AM, vd  wrote:
>
>> Hi David
>>
>> If I may ask...how do you plan to import data from oracle to cassandra ?
>> As answer AFAIK cassandra's true ability comes into play when running on
>> more than one machine...and please share how you are making comparisons like
>> on writes or reads from cassandra.
>>
>>
>>
>> ___
>> ___
>>
>>
>>
>>
>>
>> On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn wrote:
>>
>>> I'm running Oracle and Cassandra on my machine, trying to import my data
>>> to Cassandra from Oracle.
>>>
>>> In my configuration Oracle is about ten times faster than Cassandra.
>>> Cassandra has out-of-the-box tuning.
>>>
>>> I am new to Cassandra. How do I begin trying to tune it?
>>>
>>> Thanks.
>>>
>>
>>
>


Re: Tuning Cassandra

2010-05-10 Thread vd
Hi David

If I may ask...how do you plan to import data from oracle to cassandra ?
As answer AFAIK cassandra's true ability comes into play when running on
more than one machine...and please share how you are making comparisons like
on writes or reads from cassandra.



___
___




On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn  wrote:

> I'm running Oracle and Cassandra on my machine, trying to import my data to
> Cassandra from Oracle.
>
> In my configuration Oracle is about ten times faster than Cassandra.
> Cassandra has out-of-the-box tuning.
>
> I am new to Cassandra. How do I begin trying to tune it?
>
> Thanks.
>