Re: GC freeze just after repair session

2012-07-05 Thread Ravikumar Govindarajan
Our Young size=800 MB,SurvivorRatio=8,edenSize=640MB. All objects/bytes
generated during compaction are garbage right?

During compaction, with in_memory_compaction_limit=64MB and
concurrent_compactors=8,  there is a lot of pressure on ParNew sweeps.

I was thinking of decreasing concurrent_compactors and
in_memory_compaction_limit to go easy on GC

 I am not familiar with inner workings of cassandra but hope have diagnosed
the problem to a little extent.

On Fri, Jul 6, 2012 at 11:27 AM, rohit bhatia  wrote:

> @ravi, u can increase young gen size, keep a high tenuring rate or
> increase survivor ratio..
>
>
> On Fri, Jul 6, 2012 at 4:03 AM, aaron morton 
> wrote:
> > Ideally we would like to collect maximum garbage from ParNew itself,
> during
> > compactions. What are the steps to take towards to achieving this?
> >
> > I'm not sure what you are asking.
> >
> > Cheers
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 5/07/2012, at 6:56 PM, Ravikumar Govindarajan wrote:
> >
> > We have modified maxTenuringThreshold from 1 to 5. May be it is causing
> > problems. Will change it back to 1 and see how the system is.
> >
> > concurrent_compactors=8. We will reduce this, as anyway our system won't
> be
> > able to handle this number of compactions at the same time. Think it will
> > ease GC also to some extent.
> >
> > Ideally we would like to collect maximum garbage from ParNew itself,
> during
> > compactions. What are the steps to take towards to achieving this?
> >
> > On Wed, Jul 4, 2012 at 4:07 PM, aaron morton 
> > wrote:
> >>
> >> It *may* have been compaction from the repair, but it's not a big CF.
> >>
> >> I would look at the logs to see how much data was transferred to the
> node.
> >> Was their a compaction going on while the GC storm was happening ? Do
> you
> >> have a lot of secondary indexes ?
> >>
> >> If you think it correlated to compaction you can try reducing the
> >> concurrent_compactors
> >>
> >> Cheers
> >>
> >> -
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote:
> >>
> >> Recently, we faced a severe freeze [around 30-40 mins] on one of our
> >> servers. There were many mutations/reads dropped. The issue happened
> just
> >> after a routine nodetool repair for the below CF completed [1.0.7, NTS,
> >> DC1:3,DC2:2]
> >>
> >> Column Family: MsgIrtConv
> >> SSTable count: 12
> >> Space used (live): 17426379140
> >> Space used (total): 17426379140
> >> Number of Keys (estimate): 122624
> >> Memtable Columns Count: 31180
> >> Memtable Data Size: 81950175
> >> Memtable Switch Count: 31
> >> Read Count: 8074156
> >> Read Latency: 15.743 ms.
> >> Write Count: 2172404
> >> Write Latency: 0.037 ms.
> >> Pending Tasks: 0
> >> Bloom Filter False Postives: 1258
> >> Bloom Filter False Ratio: 0.03598
> >> Bloom Filter Space Used: 498672
> >> Key cache capacity: 20
> >> Key cache size: 20
> >> Key cache hit rate: 0.9965579513062582
> >> Row cache: disabled
> >> Compacted row minimum size: 51
> >> Compacted row maximum size: 89970660
> >> Compacted row mean size: 226626
> >>
> >>
> >> Our heap config is as follows
> >>
> >> -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC
> >> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> >> -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75
> >> -XX:+UseCMSInitiatingOccupancyOnly
> >>
> >> from yaml
> >> in_memory_compaction_limit=64
> >> compaction_throughput_mb_sec=8
> >> multi_threaded_compaction=false
> >>
> >>  INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085
> AntiEntropyService.java
> >> (line 762) [repair #2b6fcbf0-c1f9-11e1--2ea8811bfbff] MsgIrtConv is
> >> fully synced
> >>  INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085
> >> AntiEntropyService.java (line 698) [repair
> >> #2b6fcbf0-c1f9-11e1--2ea8811bfbff] session completed successfully
> >>  INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219
> CompactionTask.java
> >> (line 221) Compacted to
> >> [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to
> >> 40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.
>  Time:
> >> 6,186ms.
> >>
> >> After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran
> >> for every 3 seconds, while CMS ran for every 30 seconds approx
> continuous
> >> for 40 minutes.
> >>
> >>  INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line
> >> 122) GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is
> >> 8506048512
> >>  INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line
> >> 122) GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is
> >> 8506048512
> >>
> >> .
> >>
> >>  INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line

Re: GC freeze just after repair session

2012-07-05 Thread rohit bhatia
@ravi, u can increase young gen size, keep a high tenuring rate or
increase survivor ratio..


On Fri, Jul 6, 2012 at 4:03 AM, aaron morton  wrote:
> Ideally we would like to collect maximum garbage from ParNew itself, during
> compactions. What are the steps to take towards to achieving this?
>
> I'm not sure what you are asking.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/07/2012, at 6:56 PM, Ravikumar Govindarajan wrote:
>
> We have modified maxTenuringThreshold from 1 to 5. May be it is causing
> problems. Will change it back to 1 and see how the system is.
>
> concurrent_compactors=8. We will reduce this, as anyway our system won't be
> able to handle this number of compactions at the same time. Think it will
> ease GC also to some extent.
>
> Ideally we would like to collect maximum garbage from ParNew itself, during
> compactions. What are the steps to take towards to achieving this?
>
> On Wed, Jul 4, 2012 at 4:07 PM, aaron morton 
> wrote:
>>
>> It *may* have been compaction from the repair, but it's not a big CF.
>>
>> I would look at the logs to see how much data was transferred to the node.
>> Was their a compaction going on while the GC storm was happening ? Do you
>> have a lot of secondary indexes ?
>>
>> If you think it correlated to compaction you can try reducing the
>> concurrent_compactors
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote:
>>
>> Recently, we faced a severe freeze [around 30-40 mins] on one of our
>> servers. There were many mutations/reads dropped. The issue happened just
>> after a routine nodetool repair for the below CF completed [1.0.7, NTS,
>> DC1:3,DC2:2]
>>
>> Column Family: MsgIrtConv
>> SSTable count: 12
>> Space used (live): 17426379140
>> Space used (total): 17426379140
>> Number of Keys (estimate): 122624
>> Memtable Columns Count: 31180
>> Memtable Data Size: 81950175
>> Memtable Switch Count: 31
>> Read Count: 8074156
>> Read Latency: 15.743 ms.
>> Write Count: 2172404
>> Write Latency: 0.037 ms.
>> Pending Tasks: 0
>> Bloom Filter False Postives: 1258
>> Bloom Filter False Ratio: 0.03598
>> Bloom Filter Space Used: 498672
>> Key cache capacity: 20
>> Key cache size: 20
>> Key cache hit rate: 0.9965579513062582
>> Row cache: disabled
>> Compacted row minimum size: 51
>> Compacted row maximum size: 89970660
>> Compacted row mean size: 226626
>>
>>
>> Our heap config is as follows
>>
>> -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly
>>
>> from yaml
>> in_memory_compaction_limit=64
>> compaction_throughput_mb_sec=8
>> multi_threaded_compaction=false
>>
>>  INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085 AntiEntropyService.java
>> (line 762) [repair #2b6fcbf0-c1f9-11e1--2ea8811bfbff] MsgIrtConv is
>> fully synced
>>  INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085
>> AntiEntropyService.java (line 698) [repair
>> #2b6fcbf0-c1f9-11e1--2ea8811bfbff] session completed successfully
>>  INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219 CompactionTask.java
>> (line 221) Compacted to
>> [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to
>> 40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.  Time:
>> 6,186ms.
>>
>> After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran
>> for every 3 seconds, while CMS ran for every 30 seconds approx continuous
>> for 40 minutes.
>>
>>  INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line
>> 122) GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is
>> 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line
>> 122) GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is
>> 8506048512
>>
>> .
>>
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line
>> 122) GC for ParNew: 817 ms for 2 collections, 2808685768 used; max is
>> 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line
>> 122) GC for ParNew: 1165 ms for 3 collections, 3264696776 used; max is
>> 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line
>> 122) GC for ParNew: 1444 ms for 3 collections, 4234372296 used; max is
>> 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line
>> 122) GC for ParNew: 1153 ms for 2 collections, 4910279080 used; max is
>> 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector.java (line
>> 122) GC for ParNew: 697 ms for 2 collections, 4873857072 used; max is
>> 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 1

Multiple keyspace question

2012-07-05 Thread Ben Kaehne
Good evening,

I have read multiple keyspaces are bad before in a few discussions, but to
what extent?

We have some reasonably powerful machines and looking to host
an additional (currently we have 1) 2 keyspaces within our cassandra
cluster (of 3 nodes, using RF3).

At what point does adding extra keyspaces start becoming an issue? Is there
anything special we should be considering or watching out for as we
implement this?

I could not imagine that all cassandra users out there are running one
massive keyspace, and at the same time can not imaging that all cassandra
users have multiple clusters just to host different keyspaces.

Regards.

-- 
-Ben


Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
On Fri, Jul 6, 2012 at 9:44 AM, rohit bhatia  wrote:
> On Fri, Jul 6, 2012 at 4:47 AM, aaron morton  wrote:
>> 12G Heap,
>> 1600Mb Young gen,
>>
>> Is a bit higher than the normal recommendation. 1600MB young gen can cause
>> some extra ParNew pauses.
> Thanks for heads up, i'll try tinkering on this
>
>>
>> 128 Concurrent writer
>> threads
>>
>> Unless you are on SSD this is too many.
>>
> I mean 
> http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes
> , this is not memtable flush queue writers.
> Suggested value is 8*number of cores(16) = 128 itself.
>>
>> 1) Is using JDK 1.7 any way detrimental to cassandra?
>>
>> as far as I know it's not fully certified, thanks for trying it :)
>>
>> 2) What is the max write operation qps that should be expected. Is the
>> netflix benchmark also applicable for counter incrmenting tasks?
>>
>> Counters use a different write path than normal writes and are a bit slower.
>>
>> To benchmark, get a single node and work out the max throughput. Then
>> multiply by the number of nodes and divide by the RF to get a rough idea.
>>
>> the cpu
>> idle time is around 30%, cassandra is not disk bound(insignificant
>> read operations and cpu's iowait is around 0.05%)
>>
>> Wait until compaction kicks in and handle all your inserts.
>>
>> The os load is around 16-20 and the average write latency is 3ms.
>> tpstats do not show any significant pending tasks.
>>
>> The node is overloaded. What is the write latency for a single thread doing
>> as single increment against a node that has not other traffic ? The latency
>> for a request is the time spent working and the time spent waiting, once you
>> read the max throughput the time spent waiting increases. The SEDA
>> architecture is designed to limit the time spent working.
The write latency I reported is as reported by datastax opscenter for
the total latency of a client's request. This is minimum at .5ms.
In contrast, the "local write request latency" as reported by cfstats
are around 50 micro seconds but jump to 150 microseconds during the
crash.


>>
>>At this point suddenly, Several nodes start dropping several
>> "Mutation" messages. There are also lots of pending
>>
>> The cluster is overwhelmed.
>>
>>  Almost all the new threads seem to be named
>> "pool-2-thread-*".
>>
>> These are client connection threads.
>>
>> My guess is that this might be due to the 128 Writer threads not being
>> able to perform more writes.(
>>
>> Yes.
>> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214
>>
>> Work out the latency for a single client single node, then start adding
>> replication, nodes and load. When the latency increases you are getting to
>> the max throughput for that config.
>
> Also, as mentioned in my second mail, seeing messages like this "Total
> time for which application threads were stopped: 16.7663710 seconds",
> if something pauses for this long, it might be overwhelmed by the
> hints stored at other nodes. This can further cause the node to wait
> on/drop a lot of client connection threads. I'll look into what is
> causing these non-gc pauses. Thanks for the help.
>
>>
>> Hope that helps
>>
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/07/2012, at 6:49 PM, rohit bhatia wrote:
>>
>> Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
>> 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
>> threads). The replication factor is 2 with 10 column families and we
>> service Counter incrementing write intensive tasks(CL=ONE).
>>
>> I am trying to figure out the bottleneck,
>>
>> 1) Is using JDK 1.7 any way detrimental to cassandra?
>>
>> 2) What is the max write operation qps that should be expected. Is the
>> netflix benchmark also applicable for counter incrmenting tasks?
>>
>> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
>>
>> 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
>> idle time is around 30%, cassandra is not disk bound(insignificant
>> read operations and cpu's iowait is around 0.05%) and is not swapping
>> its memory(around 15 gb RAM is free or inactive). The average gc pause
>> time for parnew are 100ms occuring every second. So cassandra spends
>> 10% of its time stuck in "Stop the world" collector.
>> The os load is around 16-20 and the average write latency is 3ms.
>> tpstats do not show any significant pending tasks.
>>
>>At this point suddenly, Several nodes start dropping several
>> "Mutation" messages. There are also lots of pending
>> MutationStage,replicateOnWriteStage tasks in tpstats.
>> The number of threads in the java process increase to around 25,000
>> from the usual 300-400. Almost all the new threads seem to be named
>> "pool-2-thread-*".
>> The OS load jumps to around 30-40, the "write request latency" starts
>> spiking to more than 500ms (even to several tens of seconds sometim

Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
On Fri, Jul 6, 2012 at 4:47 AM, aaron morton  wrote:
> 12G Heap,
> 1600Mb Young gen,
>
> Is a bit higher than the normal recommendation. 1600MB young gen can cause
> some extra ParNew pauses.
Thanks for heads up, i'll try tinkering on this

>
> 128 Concurrent writer
> threads
>
> Unless you are on SSD this is too many.
>
I mean 
http://www.datastax.com/docs/0.8/configuration/node_configuration#concurrent-writes
, this is not memtable flush queue writers.
Suggested value is 8*number of cores(16) = 128 itself.
>
> 1) Is using JDK 1.7 any way detrimental to cassandra?
>
> as far as I know it's not fully certified, thanks for trying it :)
>
> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
>
> Counters use a different write path than normal writes and are a bit slower.
>
> To benchmark, get a single node and work out the max throughput. Then
> multiply by the number of nodes and divide by the RF to get a rough idea.
>
> the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%)
>
> Wait until compaction kicks in and handle all your inserts.
>
> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
>
> The node is overloaded. What is the write latency for a single thread doing
> as single increment against a node that has not other traffic ? The latency
> for a request is the time spent working and the time spent waiting, once you
> read the max throughput the time spent waiting increases. The SEDA
> architecture is designed to limit the time spent working.
>
>At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
>
> The cluster is overwhelmed.
>
>  Almost all the new threads seem to be named
> "pool-2-thread-*".
>
> These are client connection threads.
>
> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(
>
> Yes.
> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214
>
> Work out the latency for a single client single node, then start adding
> replication, nodes and load. When the latency increases you are getting to
> the max throughput for that config.

Also, as mentioned in my second mail, seeing messages like this "Total
time for which application threads were stopped: 16.7663710 seconds",
if something pauses for this long, it might be overwhelmed by the
hints stored at other nodes. This can further cause the node to wait
on/drop a lot of client connection threads. I'll look into what is
causing these non-gc pauses. Thanks for the help.

>
> Hope that helps
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/07/2012, at 6:49 PM, rohit bhatia wrote:
>
> Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
> 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
> threads). The replication factor is 2 with 10 column families and we
> service Counter incrementing write intensive tasks(CL=ONE).
>
> I am trying to figure out the bottleneck,
>
> 1) Is using JDK 1.7 any way detrimental to cassandra?
>
> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
>
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
>
> 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%) and is not swapping
> its memory(around 15 gb RAM is free or inactive). The average gc pause
> time for parnew are 100ms occuring every second. So cassandra spends
> 10% of its time stuck in "Stop the world" collector.
> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
>
>At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
> MutationStage,replicateOnWriteStage tasks in tpstats.
> The number of threads in the java process increase to around 25,000
> from the usual 300-400. Almost all the new threads seem to be named
> "pool-2-thread-*".
> The OS load jumps to around 30-40, the "write request latency" starts
> spiking to more than 500ms (even to several tens of seconds sometime).
> Even the "Local write latency" increases fourfolds to 200 microseconds
> from 50 microseconds. This happens across all the nodes and in around
> 2-3 minutes.
> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(though with  average local write latency
> of 100-150 micro seconds, each thread should be able to serve 10,000
> qps and with 128 writer threads, should be able to serve 1,280,000 qps
> per node

Re: Composite Slice Query returning non-sliced data

2012-07-05 Thread Sunit Randhawa
HI Aaron,

It is

create column family CF
with comparator =
'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)'
and key_validation_class = UTF8Type
and default_validation_class = UTF8Type;

This is allowing me to insert column names of different type.

Thanks,
Sunit.
On Thu, Jul 5, 2012 at 4:24 PM, aaron morton  wrote:
> #2 has the Composite Column and #1 does not.
>
> They are both strings.
>
> All column names *must* be of the same type. What was your CF definition ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/07/2012, at 7:26 AM, Sunit Randhawa wrote:
>
> Hello,
>
> I have 2 Columns for a 'RowKey' as below:
>
> #1 : set CF['RowKey']['1000']='A=1,B=2';
> #2: set CF['RowKey']['1000:C1']='A=2,B=3'';
>
> #2 has the Composite Column and #1 does not.
>
> Now when I execute the Composite Slice query by 1000 and C1, I do get
> both the columns above.
>
> I am hoping get #2 only since I am specifically providing "C1" as
> Start and Finish Composite Range with
> Composite.ComponentEquality.EQUAL.
>
>
> I am not sure if this is by design.
>
> Thanks,
> Sunit.
>
>


Re: Composite Slice Query returning non-sliced data

2012-07-05 Thread aaron morton
> #2 has the Composite Column and #1 does not.
They are both strings. 

All column names *must* be of the same type. What was your CF definition ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2012, at 7:26 AM, Sunit Randhawa wrote:

> Hello,
> 
> I have 2 Columns for a 'RowKey' as below:
> 
> #1 : set CF['RowKey']['1000']='A=1,B=2';
> #2: set CF['RowKey']['1000:C1']='A=2,B=3'';
> 
> #2 has the Composite Column and #1 does not.
> 
> Now when I execute the Composite Slice query by 1000 and C1, I do get
> both the columns above.
> 
> I am hoping get #2 only since I am specifically providing "C1" as
> Start and Finish Composite Range with
> Composite.ComponentEquality.EQUAL.
> 
> 
> I am not sure if this is by design.
> 
> Thanks,
> Sunit.



Re: batch_mutate

2012-07-05 Thread aaron morton
> Does it mean that the popular use case is when we need to update multiple 
> column families using the same key?
Yes. 

> Shouldn’t we design our space in such a way that those columns live in the 
> same column family?

Design a model where the data for common queries is stored in one row+cf. You 
can also take into consideration the workload. e.g. things are are updated 
frequently often live together, things that are updated infrequently often live 
together.

cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2012, at 3:16 AM, Leonid Ilyevsky wrote:

> I actually found an answer to my first question at 
> http://wiki.apache.org/cassandra/API. So I got it wrong: actually the outer 
> key is the key in the table, and the inner key is the table name (this was 
> somewhat counter-intuitive). Does it mean that the popular use case is when 
> we need to update multiple column families using the same key? Shouldn’t we 
> design our space in such a way that those columns live in the same column 
> family?
>  
> From: Leonid Ilyevsky [mailto:lilyev...@mooncapital.com] 
> Sent: Thursday, July 05, 2012 10:39 AM
> To: 'user@cassandra.apache.org'
> Subject: batch_mutate
>  
> My current way of inserting rows one by one is too slow (I use cql3 prepared 
> statements) , so I want to try batch_mutate.
>  
> Could anybody give me more details about the interface? In the javadoc it 
> says:
>  
> public 
> voidbatch_mutate(java.util.Map>>
>  mutation_map,
>  ConsistencyLevel consistency_level)
>   throws InvalidRequestException,
>  UnavailableException,
>  TimedOutException,
>  org.apache.thrift.TException
> Description copied from interface: Cassandra.Iface
> Mutate many columns or super columns for many row keys. See also: Mutation. 
> mutation_map maps key to column family to a list of Mutation objects to take 
> place at that scope. *
>  
>  
> I need to understand the meaning of the elements of mutation_map parameter.
> My guess is, the key in the outer map is columnfamily name, is this correct?
> The key in the inner map is, probably, a key to the columnfamily (it is 
> somewhat confusing that it is String while the outer key is ByteBuffer, I 
> wonder what is the rational). If this is correct, how should I do it if my 
> key is a composite one. Does anybody have an example?
>  
> Thanks,
>  
> Leonid
>  
> This email, along with any attachments, is confidential and may be legally 
> privileged or otherwise protected from disclosure. Any unauthorized 
> dissemination, copying or use of the contents of this email is strictly 
> prohibited and may be in violation of law. If you are not the intended 
> recipient, any disclosure, copying, forwarding or distribution of this email 
> is strictly prohibited and this email and any attachments should be deleted 
> immediately. This email and any attachments do not constitute an offer to 
> sell or a solicitation of an offer to purchase any interest in any investment 
> vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon 
> Capital does not provide legal, accounting or tax advice. Any statement 
> regarding legal, accounting or tax matters was not intended or written to be 
> relied upon by any person as advice. Moon Capital does not waive 
> confidentiality or privilege as a result of this email.
> 
> This email, along with any attachments, is confidential and may be legally 
> privileged or otherwise protected from disclosure. Any unauthorized 
> dissemination, copying or use of the contents of this email is strictly 
> prohibited and may be in violation of law. If you are not the intended 
> recipient, any disclosure, copying, forwarding or distribution of this email 
> is strictly prohibited and this email and any attachments should be deleted 
> immediately. This email and any attachments do not constitute an offer to 
> sell or a solicitation of an offer to purchase any interest in any investment 
> vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon 
> Capital does not provide legal, accounting or tax advice. Any statement 
> regarding legal, accounting or tax matters was not intended or written to be 
> relied upon by any person as advice. Moon Capital does not waive 
> confidentiality or privilege as a result of this email.



Re: Finding bottleneck of a cluster

2012-07-05 Thread aaron morton
> 12G Heap,
> 1600Mb Young gen, 
Is a bit higher than the normal recommendation. 1600MB young gen can cause some 
extra ParNew pauses. 

> 128 Concurrent writer
> threads
Unless you are on SSD this is too many.
 
> 1) Is using JDK 1.7 any way detrimental to cassandra?
as far as I know it's not fully certified, thanks for trying it :)

> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
Counters use a different write path than normal writes and are a bit slower. 

To benchmark, get a single node and work out the max throughput. Then multiply 
by the number of nodes and divide by the RF to get a rough idea.

> the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%) 
Wait until compaction kicks in and handle all your inserts. 

> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
The node is overloaded. What is the write latency for a single thread doing as 
single increment against a node that has not other traffic ? The latency for a 
request is the time spent working and the time spent waiting, once you read the 
max throughput the time spent waiting increases. The SEDA architecture is 
designed to limit the time spent working.  

>At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
The cluster is overwhelmed. 

>  Almost all the new threads seem to be named
> "pool-2-thread-*".
These are client connection threads. 

> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(
Yes.
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L214

Work out the latency for a single client single node, then start adding 
replication, nodes and load. When the latency increases you are getting to the 
max throughput for that config.

Hope that helps

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/07/2012, at 6:49 PM, rohit bhatia wrote:

> Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
> 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
> threads). The replication factor is 2 with 10 column families and we
> service Counter incrementing write intensive tasks(CL=ONE).
> 
> I am trying to figure out the bottleneck,
> 
> 1) Is using JDK 1.7 any way detrimental to cassandra?
> 
> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
>
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
> 
> 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%) and is not swapping
> its memory(around 15 gb RAM is free or inactive). The average gc pause
> time for parnew are 100ms occuring every second. So cassandra spends
> 10% of its time stuck in "Stop the world" collector.
> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
> 
>At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
> MutationStage,replicateOnWriteStage tasks in tpstats.
> The number of threads in the java process increase to around 25,000
> from the usual 300-400. Almost all the new threads seem to be named
> "pool-2-thread-*".
> The OS load jumps to around 30-40, the "write request latency" starts
> spiking to more than 500ms (even to several tens of seconds sometime).
> Even the "Local write latency" increases fourfolds to 200 microseconds
> from 50 microseconds. This happens across all the nodes and in around
> 2-3 minutes.
> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(though with  average local write latency
> of 100-150 micro seconds, each thread should be able to serve 10,000
> qps and with 128 writer threads, should be able to serve 1,280,000 qps
> per node)
> Could there be any other reason for this? What else should I monitor
> since system.log do not seem to say anything conclusive before
> dropping messages.
> 
> 
> 
> Thanks
> Rohit



Re: Composite key in thrift java api

2012-07-05 Thread aaron morton
>  I would really prefer to do it in Cassandra itself,
See 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2012, at 10:40 AM, Leonid Ilyevsky wrote:

> I need to create a ByteBuffer instance containing the proper composite key, 
> based on the values of the components of the key. I am going to use it for 
> update operation.
> I tried to simply concatenate the buffers corresponding to the components, 
> but I am not sure this is correct, because I am getting exception that comes 
> from the server :
>  
> InvalidRequestException(why:Not enough bytes to read value of component 0)
>  
> In the server log I see this:
>  
> org.apache.thrift.transport.TTransportException: Cannot read. Remote side has 
> closed. Tried to read 4 bytes, but only got 0 bytes. (This is often 
> indicative of an internal error on the server side. Please check your server 
> logs.)
>  
> (I believe here when it says “server side” it actually means client, because 
> it is the server’s log).
>  
> Seems like the buffer that my client sends is too short.  I suspect there is 
> a way in thrift to do it properly, but I don’t know how.
> Looks like Hector has a Composite class that maybe can help, but at this 
> point I would really prefer to do it in Cassandra itself, without Hector.
>  
> Thanks!
>  
> Leonid
>  
> 
> This email, along with any attachments, is confidential and may be legally 
> privileged or otherwise protected from disclosure. Any unauthorized 
> dissemination, copying or use of the contents of this email is strictly 
> prohibited and may be in violation of law. If you are not the intended 
> recipient, any disclosure, copying, forwarding or distribution of this email 
> is strictly prohibited and this email and any attachments should be deleted 
> immediately. This email and any attachments do not constitute an offer to 
> sell or a solicitation of an offer to purchase any interest in any investment 
> vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon 
> Capital does not provide legal, accounting or tax advice. Any statement 
> regarding legal, accounting or tax matters was not intended or written to be 
> relied upon by any person as advice. Moon Capital does not waive 
> confidentiality or privilege as a result of this email.



Re: Upgrade for Cassandra 0.8.4 to 1.+

2012-07-05 Thread aaron morton
Consult the NEWS.txt file for help on upgrading 
https://github.com/apache/cassandra/blob/trunk/NEWS.txt

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2012, at 2:52 AM, rohit bhatia wrote:

> http://cassandra.apache.org/ says 1.1.2
> 
> On Thu, Jul 5, 2012 at 7:46 PM, Raj N  wrote:
>> Hi experts,
>> I am planning to upgrade from 0.8.4 to 1.+. Whats the latest stable
>> version?
>> 
>> Thanks
>> -Rajesh



Re: cassandra on re-Start

2012-07-05 Thread aaron morton
Sounds like this problem in 1.1.0 
https://issues.apache.org/jira/browse/CASSANDRA-4219 upgrade if you are on 1.1.0

If not please paste the entire exception. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2012, at 1:32 AM, puneet loya wrote:

> 
> 
> -- Forwarded message --
> From: Rob Coli 
> Date: Mon, Jul 2, 2012 at 11:19 PM
> Subject: Re: cassandra on re-Start
> To: user@cassandra.apache.org
> 
> 
> On Mon, Jul 2, 2012 at 5:43 AM, puneet loya  wrote:
> > When I restarted the system , it is showing the keyspace does not exist.
> >
> > Not even letting me to create the keyspace with the same name again.
> 
> Paste the error you get.
> 
> =Rob
> 
> --
> =Robert Coli
> AIM>ALK - rc...@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb
> 
> The name of the keyspace is DA.
> On tryinh to create the keyspace it is giving an exception.
> I am getting a Ttransport exception for creating the keyspace. 
> 
> Previous keyspace 'DA"  that was created still exists. Because when i checked 
> the folders,the folder with name 'DA' still exists but i cannot access it.
> 
> 
> Cheers,
> Puneet



Re: CorruptedBlockException

2012-07-05 Thread aaron morton
> But I don't understand, how was all the available space taken away.
Take a look on disk at /var/lib/cassandra/data/ and 
/var/lib/cassandra/commitlog to see what is taking up a lot of space. 

Cassandra stores the column names as well as the values, so that can take up 
some space. 

>  it says that while compaction a CorruptedBlockException has occured.
Are you able to reproduce this error ? 

Thanks

 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2012, at 12:04 AM, Nury Redjepow wrote:

> Hello to all,
> 
>  I have cassandra instance I'm trying to use to store millions of file with 
> size ~ 3MB. Data structure is simple, 1 row for 1 file, with row key being 
> the id of file.
> I'm loaded 1GB of data, and total available space is 10GB. And after a few 
> hour, all the available space was taken. In log, it says that while 
> compaction a CorruptedBlockException has occured. But I don't understand, how 
> was all the available space taken away.
> 
> Data structure
> CREATE KEYSPACE largeobjectsWITH placement_strategy = 'SimpleStrategy'
> AND strategy_options={replication_factor:1};
> 
> create column family content
>   with column_type = 'Standard'  
>   and comparator = 'UTF8Type'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'TimeUUIDType'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'SizeTieredCompactionStrategy'
>   and caching = 'keys_only';
> 
> 
> Log messages
> 
> INFO [FlushWriter:9] 2012-07-04 19:56:00,783 Memtable.java (line 266) Writing 
> Memtable-content@240294142(3955135/49439187 serialized/live bytes, 91 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:00,814 Memtable.java (line 307) 
> Completed flushing 
> /var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
> d-1608-Data.db (1991862 bytes) for commitlog position 
> ReplayPosition(segmentId=24245436475633, position=78253718)
> INFO [OptionalTasks:1] 2012-07-04 19:56:02,784 MeteredFlusher.java (line 62) 
> flushing high-traffic column family CFS(Keyspace='largeobjects', 
> ColumnFamily='
> content') (estimated 46971537 bytes)
> INFO [OptionalTasks:1] 2012-07-04 19:56:02,784 ColumnFamilyStore.java (line 
> 633) Enqueuing flush of Memtable-content@1755783901(3757723/46971537 
> serialized/
> live bytes, 121 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:02,785 Memtable.java (line 266) Writing 
> Memtable-content@1755783901(3757723/46971537 serialized/live bytes, 121 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:02,835 Memtable.java (line 307) 
> Completed flushing 
> /var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
> d-1609-Data.db (1894897 bytes) for commitlog position 
> ReplayPosition(segmentId=24245436475633, position=82028986)
> INFO [OptionalTasks:1] 2012-07-04 19:56:04,785 MeteredFlusher.java (line 62) 
> flushing high-traffic column family CFS(Keyspace='largeobjects', 
> ColumnFamily='
> content') (estimated 56971025 bytes)
> INFO [OptionalTasks:1] 2012-07-04 19:56:04,785 ColumnFamilyStore.java (line 
> 633) Enqueuing flush of Memtable-content@1441175031(4557682/56971025 
> serialized/
> live bytes, 124 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:04,786 Memtable.java (line 266) Writing 
> Memtable-content@1441175031(4557682/56971025 serialized/live bytes, 124 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:04,814 Memtable.java (line 307) 
> Completed flushing 
> /var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
> d-1610-Data.db (2287280 bytes) for commitlog position 
> ReplayPosition(segmentId=24245436475633, position=86604648)
> INFO [CompactionExecutor:39] 2012-07-04 19:56:04,815 CompactionTask.java 
> (line 109) Compacting 
> [SSTableReader(path='/var/lib/cassandra/data/largeobjects/con
> tent/largeobjects-content-hd-1610-Data.db'), 
> SSTableReader(path='/var/lib/cassandra/data/largeobjects/content/largeobjects-content-hd-1608-Data.db'),
>  SSTable
> Reader(path='/var/lib/cassandra/data/largeobjects/content/largeobjects-content-hd-1609-Data.db'),
>  SSTableReader(path='/var/lib/cassandra/data/largeobjects/co
> ntent/largeobjects-content-hd-1607-Data.db')]
> INFO [OptionalTasks:1] 2012-07-04 19:56:05,786 MeteredFlusher.java (line 62) 
> flushing high-traffic column family CFS(Keyspace='largeobjects', 
> ColumnFamily='
> content') (estimated 28300225 bytes)
> INFO [OptionalTasks:1] 2012-07-04 19:56:05,786 ColumnFamilyStore.java (line 
> 633) Enqueuing flush of Memtable-content@1828084851(2264018/28300225 
> serialized/
> live bytes, 38 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:05,787 Memtable.java (line 266) Writing 
> Memtable-content@1828084851(2264018/28300225 serialized/live bytes, 38 ops)
> INFO [FlushWriter:9] 2012-07-04 19:56:05,823 Memtable.java (line 307) 
> Completed flushing 
> /var/lib/cassandr

Composite key in thrift java api

2012-07-05 Thread Leonid Ilyevsky
I need to create a ByteBuffer instance containing the proper composite key, 
based on the values of the components of the key. I am going to use it for 
update operation.
I tried to simply concatenate the buffers corresponding to the components, but 
I am not sure this is correct, because I am getting exception that comes from 
the server :

InvalidRequestException(why:Not enough bytes to read value of component 0)

In the server log I see this:

org.apache.thrift.transport.TTransportException: Cannot read. Remote side has 
closed. Tried to read 4 bytes, but only got 0 bytes. (This is often indicative 
of an internal error on the server side. Please check your server logs.)

(I believe here when it says "server side" it actually means client, because it 
is the server's log).

Seems like the buffer that my client sends is too short.  I suspect there is a 
way in thrift to do it properly, but I don't know how.
Looks like Hector has a Composite class that maybe can help, but at this point 
I would really prefer to do it in Cassandra itself, without Hector.

Thanks!

Leonid



This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP ("Moon Capital"). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


Re: steps to add node in cassandra 1.1.2

2012-07-05 Thread aaron morton
1.1. docs for the same 
http://www.datastax.com/docs/1.1/operations/cluster_management

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/07/2012, at 9:17 PM, prasenjit mukherjee wrote:

> I am using cassandar version 1.1.2. I got the document to add node for
> version 0.7 : http://www.datastax.com/docs/0.7/getting_started/configuring
> 
> Is it still valid ? Is there a documentation on this topic from
> cassandra twiki/docs ?
> 
> -Prasenjit



Re: GC freeze just after repair session

2012-07-05 Thread aaron morton
> Ideally we would like to collect maximum garbage from ParNew itself, during 
> compactions. What are the steps to take towards to achieving this?
I'm not sure what you are asking. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/07/2012, at 6:56 PM, Ravikumar Govindarajan wrote:

> We have modified maxTenuringThreshold from 1 to 5. May be it is causing 
> problems. Will change it back to 1 and see how the system is.
> 
> concurrent_compactors=8. We will reduce this, as anyway our system won't be 
> able to handle this number of compactions at the same time. Think it will 
> ease GC also to some extent.
> 
> Ideally we would like to collect maximum garbage from ParNew itself, during 
> compactions. What are the steps to take towards to achieving this?
> 
> On Wed, Jul 4, 2012 at 4:07 PM, aaron morton  wrote:
> It *may* have been compaction from the repair, but it's not a big CF.
> 
> I would look at the logs to see how much data was transferred to the node. 
> Was their a compaction going on while the GC storm was happening ? Do you 
> have a lot of secondary indexes ? 
> 
> If you think it correlated to compaction you can try reducing the 
> concurrent_compactors 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 3/07/2012, at 6:33 PM, Ravikumar Govindarajan wrote:
> 
>> Recently, we faced a severe freeze [around 30-40 mins] on one of our 
>> servers. There were many mutations/reads dropped. The issue happened just 
>> after a routine nodetool repair for the below CF completed [1.0.7, NTS, 
>> DC1:3,DC2:2]
>> 
>>  Column Family: MsgIrtConv
>>  SSTable count: 12
>>  Space used (live): 17426379140
>>  Space used (total): 17426379140
>>  Number of Keys (estimate): 122624
>>  Memtable Columns Count: 31180
>>  Memtable Data Size: 81950175
>>  Memtable Switch Count: 31
>>  Read Count: 8074156
>>  Read Latency: 15.743 ms.
>>  Write Count: 2172404
>>  Write Latency: 0.037 ms.
>>  Pending Tasks: 0
>>  Bloom Filter False Postives: 1258
>>  Bloom Filter False Ratio: 0.03598
>>  Bloom Filter Space Used: 498672
>>  Key cache capacity: 20
>>  Key cache size: 20
>>  Key cache hit rate: 0.9965579513062582
>>  Row cache: disabled
>>  Compacted row minimum size: 51
>>  Compacted row maximum size: 89970660
>>  Compacted row mean size: 226626
>> 
>> 
>> Our heap config is as follows
>> 
>> -Xms8G -Xmx8G -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC 
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
>> -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=75 
>> -XX:+UseCMSInitiatingOccupancyOnly
>> 
>> from yaml
>> in_memory_compaction_limit=64
>> compaction_throughput_mb_sec=8
>> multi_threaded_compaction=false
>> 
>>  INFO [AntiEntropyStage:1] 2012-06-29 09:21:26,085 AntiEntropyService.java 
>> (line 762) [repair #2b6fcbf0-c1f9-11e1--2ea8811bfbff] MsgIrtConv is 
>> fully synced
>>  INFO [AntiEntropySessions:8] 2012-06-29 09:21:26,085 
>> AntiEntropyService.java (line 698) [repair 
>> #2b6fcbf0-c1f9-11e1--2ea8811bfbff] session completed successfully
>>  INFO [CompactionExecutor:857] 2012-06-29 09:21:31,219 CompactionTask.java 
>> (line 221) Compacted to 
>> [/home/sas/system/data/ZMail/MsgIrtConv-hc-858-Data.db,].  47,907,012 to 
>> 40,554,059 (~84% of original) bytes for 4,564 keys at 6.252080MB/s.  Time: 
>> 6,186ms.
>> 
>> After this, the logs were fully filled with GC [ParNew/CMS]. ParNew ran for 
>> every 3 seconds, while CMS ran for every 30 seconds approx continuous for 40 
>> minutes.
>> 
>>  INFO [ScheduledTasks:1] 2012-06-29 09:23:39,921 GCInspector.java (line 122) 
>> GC for ParNew: 776 ms for 2 collections, 2901990208 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 09:23:42,265 GCInspector.java (line 122) 
>> GC for ParNew: 2028 ms for 2 collections, 3831282056 used; max is 8506048512
>> 
>> .
>> 
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:53,884 GCInspector.java (line 122) 
>> GC for ParNew: 817 ms for 2 collections, 2808685768 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:55,632 GCInspector.java (line 122) 
>> GC for ParNew: 1165 ms for 3 collections, 3264696776 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:57,773 GCInspector.java (line 122) 
>> GC for ParNew: 1444 ms for 3 collections, 4234372296 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:07:59,387 GCInspector.java (line 122) 
>> GC for ParNew: 1153 ms for 2 collections, 4910279080 used; max is 8506048512
>>  INFO [ScheduledTasks:1] 2012-06-29 10:08:00,389 GCInspector

Re: Thrift version and OOM errors

2012-07-05 Thread aaron morton
agree. 

It's a good idea to remove as many variables and possible and get to a 
stable/known state. Use a clean install and a well known client and see if the 
problems persist. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/07/2012, at 4:58 PM, Tristan Seligmann wrote:

> On Jul 4, 2012 2:02 PM, "Vasileios Vlachos"  
> wrote:
> >
> > Any ideas what could be causing strange message lengths?
> 
> One cause of this that I've seen is a client using unframed Thrift transport 
> while the server expects framed, or vice versa. I suppose a similar cause 
> could be something that is not a Thrift client at all mistakenly connecting 
> to Cassandra's Thrift port. 
> 



Re: Enable CQL3 from Astyanax

2012-07-05 Thread aaron morton
Can you provide an example ? 

select * should return all the columns from the CF. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/07/2012, at 4:31 AM, Thierry Templier wrote:

> Thanks Aaron.
> 
> I wonder if it's possible to obtain columns from a CQL 3 select query (with a 
> select *) that aren't defined in the create table. These fields are present 
> when all attributes are loaded but not when using CQL3. Is it the normal 
> behavior? Thanks very much!
> 
> Thierry
> 
>> Thanks for contributing. 
>> 
>> I'm behind the curve on CQL 3, but here is a post about some of the changes 
>> http://www.datastax.com/dev/blog/whats-new-in-cql-3-0
>> 
>> Cheers
>> 
>> 
>> -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 28/06/2012, at 2:30 AM, Thierry Templier wrote:
>> 
>>> Hello Aaron,
>>> 
>>> I created an issue on the Astyanax github for this problem. I added a fix 
>>> to support CQL3 in the tool.
>>> See the link https://github.com/Netflix/astyanax/issues/75.
>>> 
>>> Thierry
>>> 
 Had a quick look, the current master does not appear to support it. 
 
 Cheers
>> 
> 
> 



CQL 3 with a right API

2012-07-05 Thread Shahryar Sedghi
Hi

I am new to to Cassandra and we started with 1.1 and modeled everything
with Composite columns and wide rows and chose CQL 3 even if it is beta.
Since I could not find a way in Hector to set CQL 3, I started with Thrift
and prototyped all my scenarios with Thrift including retrieving all row
keys (without CQL). Recently I saw a JDBC driver for 1.1.1 and it is so
promising (slightly slower than thrift in most of my scenarios).
Apparently  JDBC "will be" the ultimate Java API for Cassandra, so the
question is:

Since there is no distinct clause in CQL 3, is there a way to retrieve all
row keys "with JDBC" without browsing all columns of the CF (and make it
distinct yourself) ?

Thanks

Shahryar Sedghi

-- 
"Life is what happens while you are making other plans." ~ John Lennon


Composite Slice Query returning non-sliced data

2012-07-05 Thread Sunit Randhawa
Hello,

I have 2 Columns for a 'RowKey' as below:

 #1 : set CF['RowKey']['1000']='A=1,B=2';
 #2: set CF['RowKey']['1000:C1']='A=2,B=3'';

#2 has the Composite Column and #1 does not.

Now when I execute the Composite Slice query by 1000 and C1, I do get
both the columns above.

I am hoping get #2 only since I am specifically providing "C1" as
Start and Finish Composite Range with
Composite.ComponentEquality.EQUAL.


I am not sure if this is by design.

Thanks,
Sunit.


JNA on Windows

2012-07-05 Thread Fredrik Stigbäck
Hello.
I have a question regarding JNA and Windows.
I read about the problem that when taking snapshots might require the
process space x 2 due to how hardlinks are created.
Is JNA for Windows supported?
Looking at jira issue
https://issues.apache.org/jira/browse/CASSANDRA-1371 looks like it but
checking in the Cassandra code base
org.apache.cassandra.utils.CLibrary the only thing I see, is
Native.register("c") which tries to load the c-library but I think
doesn't exists on Windows which will result in creating links with cmd
or fsutil and which might then triggger these extensive memory
requirements.
I'd be happy if someone could shed some light on this issue.
Regards
/Fredrik


Re: Wide rows and reads

2012-07-05 Thread Philip Shon
>From what I understand, wide rows have quite a bit of overhead, especially
if you are picking columns that are far apart from each other for a given
row.

This post by Aaron Morton was quite good at explaining this issue
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

-Phil

On Thu, Jul 5, 2012 at 12:17 PM, Oleg Dulin  wrote:

> Here is my flow:
>
> One process write a really wide row (250K+ supercolumns, each one with 5
> subcolumns, for the total of 1K or so per supercolumn)
>
> Second process comes in literally 2-3 seconds later and starts reading
> from it.
>
> My observation is that nothing good happens. It is ridiculously slow to
> read. It seems that if I wait long enough, the reads from that row will be
> much faster.
>
> Could someone enlighten me as to what exactly happens when I do this ?
>
> Regards,
> Oleg
>
>
>


Wide rows and reads

2012-07-05 Thread Oleg Dulin

Here is my flow:

One process write a really wide row (250K+ supercolumns, each one with 
5 subcolumns, for the total of 1K or so per supercolumn)


Second process comes in literally 2-3 seconds later and starts reading from it.

My observation is that nothing good happens. It is ridiculously slow to 
read. It seems that if I wait long enough, the reads from that row will 
be much faster.


Could someone enlighten me as to what exactly happens when I do this ?

Regards,
Oleg




RE: batch_mutate

2012-07-05 Thread Leonid Ilyevsky
I actually found an answer to my first question at 
http://wiki.apache.org/cassandra/API. So I got it wrong: actually the outer key 
is the key in the table, and the inner key is the table name (this was somewhat 
counter-intuitive). Does it mean that the popular use case is when we need to 
update multiple column families using the same key? Shouldn't we design our 
space in such a way that those columns live in the same column family?

From: Leonid Ilyevsky [mailto:lilyev...@mooncapital.com]
Sent: Thursday, July 05, 2012 10:39 AM
To: 'user@cassandra.apache.org'
Subject: batch_mutate

My current way of inserting rows one by one is too slow (I use cql3 prepared 
statements) , so I want to try batch_mutate.

Could anybody give me more details about the interface? In the javadoc it says:

public void 
batch_mutate(java.util.Map>>>
 mutation_map,
 
ConsistencyLevel
 consistency_level)
  throws 
InvalidRequestException,
 
UnavailableException,
 
TimedOutException,
 org.apache.thrift.TException
Description copied from interface: 
Cassandra.Iface
Mutate many columns or super columns for many row keys. See also: Mutation. 
mutation_map maps key to column family to a list of Mutation objects to take 
place at that scope. *


I need to understand the meaning of the elements of mutation_map parameter.
My guess is, the key in the outer map is columnfamily name, is this correct?
The key in the inner map is, probably, a key to the columnfamily (it is 
somewhat confusing that it is String while the outer key is ByteBuffer, I 
wonder what is the rational). If this is correct, how should I do it if my key 
is a composite one. Does anybody have an example?

Thanks,

Leonid


This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP ("Moon Capital"). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP ("Moon Capital"). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


Re: Upgrade for Cassandra 0.8.4 to 1.+

2012-07-05 Thread rohit bhatia
http://cassandra.apache.org/ says 1.1.2

On Thu, Jul 5, 2012 at 7:46 PM, Raj N  wrote:
> Hi experts,
>  I am planning to upgrade from 0.8.4 to 1.+. Whats the latest stable
> version?
>
> Thanks
> -Rajesh


batch_mutate

2012-07-05 Thread Leonid Ilyevsky
My current way of inserting rows one by one is too slow (I use cql3 prepared 
statements) , so I want to try batch_mutate.

Could anybody give me more details about the interface? In the javadoc it says:

public void 
batch_mutate(java.util.Map>>>
 mutation_map,
 
ConsistencyLevel
 consistency_level)
  throws 
InvalidRequestException,
 
UnavailableException,
 
TimedOutException,
 org.apache.thrift.TException
Description copied from interface: 
Cassandra.Iface
Mutate many columns or super columns for many row keys. See also: Mutation. 
mutation_map maps key to column family to a list of Mutation objects to take 
place at that scope. *


I need to understand the meaning of the elements of mutation_map parameter.
My guess is, the key in the outer map is columnfamily name, is this correct?
The key in the inner map is, probably, a key to the columnfamily (it is 
somewhat confusing that it is String while the outer key is ByteBuffer, I 
wonder what is the rational). If this is correct, how should I do it if my key 
is a composite one. Does anybody have an example?

Thanks,

Leonid


This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP ("Moon Capital"). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


Upgrade for Cassandra 0.8.4 to 1.+

2012-07-05 Thread Raj N
Hi experts,
 I am planning to upgrade from 0.8.4 to 1.+. Whats the latest stable
version?

Thanks
-Rajesh


Re: cassandra on re-Start

2012-07-05 Thread puneet loya
-- Forwarded message --
From: Rob Coli 
Date: Mon, Jul 2, 2012 at 11:19 PM
Subject: Re: cassandra on re-Start
To: user@cassandra.apache.org


On Mon, Jul 2, 2012 at 5:43 AM, puneet loya  wrote:
> When I restarted the system , it is showing the keyspace does not exist.
>
> Not even letting me to create the keyspace with the same name again.

Paste the error you get.

=Rob

--
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb

The name of the keyspace is DA.
On tryinh to create the keyspace it is giving an exception.
I am getting a Ttransport exception for creating the keyspace.

Previous keyspace 'DA"  that was created still exists. Because when i
checked the folders,the folder with name 'DA' still exists but i cannot
access it.


Cheers,
Puneet


Re: Finding bottleneck of a cluster

2012-07-05 Thread rohit bhatia
Also,


Looking at gc log. I see messages like this across different servers
before they start dropping messages

"2012-07-04T10:48:20.336+: 96771.117: [GC 96771.118: [ParNew:
1367297K->57371K(1474560K), 0.0617350 secs]
6641571K->5340088K(12419072K), 0.0634460 secs] [Times: user=0.56
sys=0.01, real=0.06 secs]
Total time for which application threads were stopped: 0.0850010 seconds
Total time for which application threads were stopped: 16.7663710 seconds"

The 16 second pause doesnt seem to be caused by the minor/major gc
which are quite fast and are also logged. "Total time for which ..."
messages are caused by PrintGCApplicationStoppedTime paramater which
is supposed to be logged whenever threads reach a safepoint. Is there
any way I can figure out what caused the java threads to pause.

Thanks
Rohit

On Thu, Jul 5, 2012 at 12:19 PM, rohit bhatia  wrote:
> Our Cassandra cluster consists of 8 nodes(16 core, 32G ram, 12G Heap,
> 1600Mb Young gen, cassandra1.0.5, JDK 1.7, 128 Concurrent writer
> threads). The replication factor is 2 with 10 column families and we
> service Counter incrementing write intensive tasks(CL=ONE).
>
> I am trying to figure out the bottleneck,
>
> 1) Is using JDK 1.7 any way detrimental to cassandra?
>
> 2) What is the max write operation qps that should be expected. Is the
> netflix benchmark also applicable for counter incrmenting tasks?
> 
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
>
> 3) At around 50,000qps for the cluster (~12500 qps per node), the cpu
> idle time is around 30%, cassandra is not disk bound(insignificant
> read operations and cpu's iowait is around 0.05%) and is not swapping
> its memory(around 15 gb RAM is free or inactive). The average gc pause
> time for parnew are 100ms occuring every second. So cassandra spends
> 10% of its time stuck in "Stop the world" collector.
> The os load is around 16-20 and the average write latency is 3ms.
> tpstats do not show any significant pending tasks.
>
> At this point suddenly, Several nodes start dropping several
> "Mutation" messages. There are also lots of pending
> MutationStage,replicateOnWriteStage tasks in tpstats.
> The number of threads in the java process increase to around 25,000
> from the usual 300-400. Almost all the new threads seem to be named
> "pool-2-thread-*".
> The OS load jumps to around 30-40, the "write request latency" starts
> spiking to more than 500ms (even to several tens of seconds sometime).
> Even the "Local write latency" increases fourfolds to 200 microseconds
> from 50 microseconds. This happens across all the nodes and in around
> 2-3 minutes.
> My guess is that this might be due to the 128 Writer threads not being
> able to perform more writes.(though with  average local write latency
> of 100-150 micro seconds, each thread should be able to serve 10,000
> qps and with 128 writer threads, should be able to serve 1,280,000 qps
> per node)
> Could there be any other reason for this? What else should I monitor
> since system.log do not seem to say anything conclusive before
> dropping messages.
>
>
>
> Thanks
> Rohit


CorruptedBlockException

2012-07-05 Thread Nury Redjepow
Hello to all,

 I have cassandra instance I'm trying to use to store millions of file with 
size ~ 3MB. Data structure is simple, 1 row for 1 file, with row key being the 
id of file.
I'm loaded 1GB of data, and total available space is 10GB. And after a few 
hour, all the available space was taken. In log, it says that while compaction 
a CorruptedBlockException has occured. But I don't understand, how was all the 
available space taken away.

Data structure
CREATE KEYSPACE largeobjectsWITH placement_strategy = 'SimpleStrategy'
AND strategy_options={replication_factor:1};

create column family content
  with column_type = 'Standard'  
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'TimeUUIDType'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'SizeTieredCompactionStrategy'
  and caching = 'keys_only';


Log messages

INFO [FlushWriter:9] 2012-07-04 19:56:00,783 Memtable.java (line 266) Writing 
Memtable-content@240294142(3955135/49439187 serialized/live bytes, 91 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:00,814 Memtable.java (line 307) 
Completed flushing 
/var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
d-1608-Data.db (1991862 bytes) for commitlog position 
ReplayPosition(segmentId=24245436475633, position=78253718)
 INFO [OptionalTasks:1] 2012-07-04 19:56:02,784 MeteredFlusher.java (line 62) 
flushing high-traffic column family CFS(Keyspace='largeobjects', ColumnFamily='
content') (estimated 46971537 bytes)
 INFO [OptionalTasks:1] 2012-07-04 19:56:02,784 ColumnFamilyStore.java (line 
633) Enqueuing flush of Memtable-content@1755783901(3757723/46971537 serialized/
live bytes, 121 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:02,785 Memtable.java (line 266) Writing 
Memtable-content@1755783901(3757723/46971537 serialized/live bytes, 121 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:02,835 Memtable.java (line 307) 
Completed flushing 
/var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
d-1609-Data.db (1894897 bytes) for commitlog position 
ReplayPosition(segmentId=24245436475633, position=82028986)
 INFO [OptionalTasks:1] 2012-07-04 19:56:04,785 MeteredFlusher.java (line 62) 
flushing high-traffic column family CFS(Keyspace='largeobjects', ColumnFamily='
content') (estimated 56971025 bytes)
 INFO [OptionalTasks:1] 2012-07-04 19:56:04,785 ColumnFamilyStore.java (line 
633) Enqueuing flush of Memtable-content@1441175031(4557682/56971025 serialized/
live bytes, 124 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:04,786 Memtable.java (line 266) Writing 
Memtable-content@1441175031(4557682/56971025 serialized/live bytes, 124 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:04,814 Memtable.java (line 307) 
Completed flushing 
/var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
d-1610-Data.db (2287280 bytes) for commitlog position 
ReplayPosition(segmentId=24245436475633, position=86604648)
 INFO [CompactionExecutor:39] 2012-07-04 19:56:04,815 CompactionTask.java (line 
109) Compacting [SSTableReader(path='/var/lib/cassandra/data/largeobjects/con
tent/largeobjects-content-hd-1610-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/largeobjects/content/largeobjects-content-hd-1608-Data.db'),
 SSTable
Reader(path='/var/lib/cassandra/data/largeobjects/content/largeobjects-content-hd-1609-Data.db'),
 SSTableReader(path='/var/lib/cassandra/data/largeobjects/co
ntent/largeobjects-content-hd-1607-Data.db')]
 INFO [OptionalTasks:1] 2012-07-04 19:56:05,786 MeteredFlusher.java (line 62) 
flushing high-traffic column family CFS(Keyspace='largeobjects', ColumnFamily='
content') (estimated 28300225 bytes)
 INFO [OptionalTasks:1] 2012-07-04 19:56:05,786 ColumnFamilyStore.java (line 
633) Enqueuing flush of Memtable-content@1828084851(2264018/28300225 serialized/
live bytes, 38 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:05,787 Memtable.java (line 266) Writing 
Memtable-content@1828084851(2264018/28300225 serialized/live bytes, 38 ops)
 INFO [FlushWriter:9] 2012-07-04 19:56:05,823 Memtable.java (line 307) 
Completed flushing 
/var/lib/cassandra/data/largeobjects/content/largeobjects-content-h
d-1612-Data.db (1134604 bytes) for commitlog position 
ReplayPosition(segmentId=24245436475633, position=88874176)
ERROR [CompactionExecutor:39] 2012-07-04 19:56:06,667 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:39,1,main]
java.io.IOError: org.apache.cassandra.io.compress.CorruptedBlockException: 
(/var/lib/cassandra/data/largeobjects/content/largeobjects-content-hd-1610-Data.db
): corruption detected, chunk at 1573104 of length 65545.
 at 
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
 at 
org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:99)
 at 
org.apache.cass

London meetup - 16th July

2012-07-05 Thread Dave Gardner
The next London meetup is coming up on 16th July.

We've got two speakers - Richard Churchill talking about his
experiences rolling out Cassandra at ServiceTick and Tom Wilkie
talking about real time analytics on top of Cassandra.

http://www.meetup.com/Cassandra-London/events/69791362/

Dave


steps to add node in cassandra 1.1.2

2012-07-05 Thread prasenjit mukherjee
I am using cassandar version 1.1.2. I got the document to add node for
version 0.7 : http://www.datastax.com/docs/0.7/getting_started/configuring

Is it still valid ? Is there a documentation on this topic from
cassandra twiki/docs ?

-Prasenjit


Re: datastax aws ami

2012-07-05 Thread Deno Vichas

i did with no luck.  i got my fire put out.

for some reason one of my nodes upgraded itself after rebooting to fix 
the leap second bug.  i use apt-get to put on 1.0.8.  seeing that my 
cluster was running 1.0.7 i had to upgrade the rest of the nodes.  
upgrading was very simple, stop, apt-get install, start one node at a time.


thanks,
deno


On 7/4/2012 3:18 AM, aaron morton wrote:

Try the data stax forums http://www.datastax.com/support-forums/

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/07/2012, at 7:28 AM, Deno Vichas wrote:


is the 2.1 image still around?

On 7/2/2012 11:24 AM, Deno Vichas wrote:

all,

i've got a datastax 2.1 ami instance that's screwed up.  for some 
reason it won't read the config file.  what's the recommended way to 
replace this node with a new one?  it doesn't seem like you can use 
the ami to bring up single nodes as it want to do whole clusters.



thanks,
deno