tuning for read performance

2012-10-22 Thread feedly team
Hi,
I have a small 2 node cassandra cluster that seems to be constrained by
read throughput. There are about 100 writes/s and 60 reads/s mostly against
a skinny column family. Here's the cfstats for that family:

 SSTable count: 13
 Space used (live): 231920026568
 Space used (total): 231920026568
 Number of Keys (estimate): 356899200
 Memtable Columns Count: 1385568
 Memtable Data Size: 359155691
 Memtable Switch Count: 26
 Read Count: 40705879
 Read Latency: 25.010 ms.
 Write Count: 9680958
 Write Latency: 0.036 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 28380
 Bloom Filter False Ratio: 0.00360
 Bloom Filter Space Used: 874173664
 Compacted row minimum size: 61
 Compacted row maximum size: 152321
 Compacted row mean size: 1445

iostat shows almost no write activity, here's a typical line:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sdb   0.00 0.00  312.870.00 6.61 0.0043.27
   23.35  105.06   2.28  71.19

and nodetool tpstats always shows pending tasks in the ReadStage. The data
set has grown beyond physical memory (250GB/node w/64GB of RAM) so I know
disk access is required, but are there particular settings I should
experiment with that could help relieve some read i/o pressure? I already
put memcached in front of cassandra so the row cache probably won't help
much.

Also this column family stores smallish documents (usually 1-100K) along
with metadata. The document is only occasionally accessed, usually only the
metadata is read/written. Would splitting out the document into a separate
column family help?

Thanks
Kireet


frequent node up/downs

2012-07-02 Thread feedly team
Hello,
   I recently set up a 2 node cassandra cluster on dedicated hardware. In
the logs there have been a lot of "InetAddress xxx is now dead' or UP
messages. Comparing the log messages between the 2 nodes, they seem to
coincide with extremely long ParNew collections. I have seem some of up to
50 seconds. The installation is pretty vanilla, I didn't change any
settings and the machines don't seem particularly busy - cassandra is the
only thing running on the machine with an 8GB heap. The machine has 64GB of
RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx
full. You may need to reduce memtable and/or cache sizes' messages. Would
this help with the long ParNew collections? That message seems to be
triggered on a full collection.


Re: frequent node up/downs

2012-07-02 Thread feedly team
Yeah I noticed the leap second problem and ran the suggested fix, but I
have been facing these problems before Saturday and still see the
occasional failures after running the fix.

Thanks.

On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both  wrote:

> Yeah! Look that.
>
> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/
> I had the same problem. The solution was rebooting.
>
> On Mon, 2 Jul 2012 11:08:57 -0400
> feedly team  wrote:
>
> > Hello,
> >I recently set up a 2 node cassandra cluster on dedicated hardware. In
> > the logs there have been a lot of "InetAddress xxx is now dead' or UP
> > messages. Comparing the log messages between the 2 nodes, they seem to
> > coincide with extremely long ParNew collections. I have seem some of up
> to
> > 50 seconds. The installation is pretty vanilla, I didn't change any
> > settings and the machines don't seem particularly busy - cassandra is the
> > only thing running on the machine with an 8GB heap. The machine has 64GB
> of
> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx
> > full. You may need to reduce memtable and/or cache sizes' messages. Would
> > this help with the long ParNew collections? That message seems to be
> > triggered on a full collection.
>
> --
> Marcus Both
>
>


Re: frequent node up/downs

2012-07-02 Thread feedly team
Couple more details. I confirmed that swap space is not being used (free -m
shows 0 swap) and cassandra.log has a message like "JNA mlockall
successful". top shows the process having 9g in resident memory but 21.6g
in virtual...What accounts for the much larger virtual number? some kind of
off-heap memory?

I'm a little puzzled as to why I would get such long pauses without
swapping. I uncommented all the gc logging options in cassandra-env.sh to
try to see what is going on when the node freezes.

Thanks
Kireet

On Mon, Jul 2, 2012 at 9:51 PM, feedly team  wrote:

> Yeah I noticed the leap second problem and ran the suggested fix, but I
> have been facing these problems before Saturday and still see the
> occasional failures after running the fix.
>
> Thanks.
>
>
> On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both  wrote:
>
>> Yeah! Look that.
>>
>> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/
>> I had the same problem. The solution was rebooting.
>>
>> On Mon, 2 Jul 2012 11:08:57 -0400
>> feedly team  wrote:
>>
>> > Hello,
>> >I recently set up a 2 node cassandra cluster on dedicated hardware.
>> In
>> > the logs there have been a lot of "InetAddress xxx is now dead' or UP
>> > messages. Comparing the log messages between the 2 nodes, they seem to
>> > coincide with extremely long ParNew collections. I have seem some of up
>> to
>> > 50 seconds. The installation is pretty vanilla, I didn't change any
>> > settings and the machines don't seem particularly busy - cassandra is
>> the
>> > only thing running on the machine with an 8GB heap. The machine has
>> 64GB of
>> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx
>> > full. You may need to reduce memtable and/or cache sizes' messages.
>> Would
>> > this help with the long ParNew collections? That message seems to be
>> > triggered on a full collection.
>>
>> --
>> Marcus Both
>>
>>
>


Re: frequent node up/downs

2012-07-06 Thread feedly team
I reduced the load and the problem hasn't been happening as much. After
enabling gc logging, I see messages mentioning promotion failed when the
pauses happen. It looks like this happens when there is a promotion
failure. From reading on the web it looks like I could try reducing the
CMSInitiatingOccupancyFraction value and/or decreasing the young gen size
to try to avoid this scenario.

Also is it normal to see the "Heap is xx full.  You may need to reduce
memtable and/or cache sizes" message quite often? I haven't turned on row
caches or changed any default memtable size settings so I am wondering why
the old gen fills up.


On Wed, Jul 4, 2012 at 6:28 AM, aaron morton wrote:

> What accounts for the much larger virtual number? some kind of off-heap
> memory?
>
> http://wiki.apache.org/cassandra/FAQ#mmap
>
> I'm a little puzzled as to why I would get such long pauses without
> swapping.
>
> The two are not related. On startup the JVM memory is locked so it will
> not swap, from then on memory management is pretty much up the JVM.
>
> Getting a lot of ParNew activity does not mean the JVM is low on memory,
> it means there is a lot of activity in the new heap.
>
> If you have a lot of insert activity (typically in a load test) you can
> generate a lot of GC activity. Try reducing the load to a point where it
> does not ht GC and then increase to find the cause. Also if you can connect
> JConole to the JVM you may get a better view of the heap usage.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/07/2012, at 3:41 PM, feedly team wrote:
>
> Couple more details. I confirmed that swap space is not being used (free
> -m shows 0 swap) and cassandra.log has a message like "JNA mlockall
> successful". top shows the process having 9g in resident memory but 21.6g
> in virtual...What accounts for the much larger virtual number? some kind of
> off-heap memory?
>
> I'm a little puzzled as to why I would get such long pauses without
> swapping. I uncommented all the gc logging options in cassandra-env.sh to
> try to see what is going on when the node freezes.
>
> Thanks
> Kireet
>
> On Mon, Jul 2, 2012 at 9:51 PM, feedly team  wrote:
>
>> Yeah I noticed the leap second problem and ran the suggested fix, but I
>> have been facing these problems before Saturday and still see the
>> occasional failures after running the fix.
>>
>> Thanks.
>>
>>
>> On Mon, Jul 2, 2012 at 11:17 AM, Marcus Both  wrote:
>>
>>> Yeah! Look that.
>>>
>>> http://arstechnica.com/business/2012/07/one-day-later-the-leap-second-v-the-internet-scorecard/
>>> I had the same problem. The solution was rebooting.
>>>
>>> On Mon, 2 Jul 2012 11:08:57 -0400
>>> feedly team  wrote:
>>>
>>> > Hello,
>>> >I recently set up a 2 node cassandra cluster on dedicated hardware.
>>> In
>>> > the logs there have been a lot of "InetAddress xxx is now dead' or UP
>>> > messages. Comparing the log messages between the 2 nodes, they seem to
>>> > coincide with extremely long ParNew collections. I have seem some of
>>> up to
>>> > 50 seconds. The installation is pretty vanilla, I didn't change any
>>> > settings and the machines don't seem particularly busy - cassandra is
>>> the
>>> > only thing running on the machine with an 8GB heap. The machine has
>>> 64GB of
>>> > RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx
>>> > full. You may need to reduce memtable and/or cache sizes' messages.
>>> Would
>>> > this help with the long ParNew collections? That message seems to be
>>> > triggered on a full collection.
>>>
>>> --
>>> Marcus Both
>>>
>>>
>>
>
>


Re: frequent node up/downs

2012-07-06 Thread feedly team
responses below. thanks!

On Fri, Jul 6, 2012 at 3:09 PM, aaron morton wrote:

> It looks like this happens when there is a promotion failure.
>
>
> Java Heap is full.
> Memory is fragmented.
> Use C for web scale.
>
unfortunately i became too dumb to use C around 2004. camping accident.

>
> Also is it normal to see the "Heap is xx full.  You may need to reduce
> memtable and/or cache sizes" message quite often? I haven't turned on row
> caches or changed any default memtable size settings so I am wondering why
> the old gen fills up.
>
>
> It's odd to get that out of the box with an 8GB heap on a 1.1.X install.
>
> What sort of work load ? Is it under heavy inserts ?
>
opscenter shows between 60-120 writes/sec and between 80-150 reads/sec
total for both machines. i am not sure if that is considered heavy or not.
the machines don't seem particularly busy. load seems pretty even across
both.

Do you have a lot of CF's ? A lot of secondary indexes ?
>
i have 15 column families with maybe 4 that are larger and active. there
are a couple secondary indexes. opscenter uses 8 CFs and system 7. total
data is ~100GB

After the messages is it able to reduce heap usage ?
>
 seems like it, they occur every few minutes for awhile and then stop.

Does it seem to correlate to compactions ?
>
no.


> Is the node able to get back to a healthy state ?
>
yes. after the gc finishes it rejoins the cluster.


> If this is testing are you able to pull back to a workload where the
> issues doe not appear ?
>

i am guessing so. i am running a data-heavy background processing job. when
i reduced thread count from 20 to 15 the problem has happened only once in
the past 2 days vs 2-3 times a day. we are just starting to use cassandra
so i am more worried about when more critical web traffic hits.


>
> Cheers
>
> -----
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/07/2012, at 4:33 AM, feedly team wrote:
>
> I reduced the load and the problem hasn't been happening as much. After
> enabling gc logging, I see messages mentioning promotion failed when the
> pauses happen. It looks like this happens when there is a promotion
> failure. From reading on the web it looks like I could try reducing the
> CMSInitiatingOccupancyFraction value and/or decreasing the young gen size
> to try to avoid this scenario.
>
> Also is it normal to see the "Heap is xx full.  You may need to reduce
> memtable and/or cache sizes" message quite often? I haven't turned on row
> caches or changed any default memtable size settings so I am wondering why
> the old gen fills up.
>
>
> On Wed, Jul 4, 2012 at 6:28 AM, aaron morton wrote:
>
>> What accounts for the much larger virtual number? some kind of off-heap
>> memory?
>>
>> http://wiki.apache.org/cassandra/FAQ#mmap
>>
>> I'm a little puzzled as to why I would get such long pauses without
>> swapping.
>>
>> The two are not related. On startup the JVM memory is locked so it will
>> not swap, from then on memory management is pretty much up the JVM.
>>
>> Getting a lot of ParNew activity does not mean the JVM is low on memory,
>> it means there is a lot of activity in the new heap.
>>
>> If you have a lot of insert activity (typically in a load test) you can
>> generate a lot of GC activity. Try reducing the load to a point where it
>> does not ht GC and then increase to find the cause. Also if you can connect
>> JConole to the JVM you may get a better view of the heap usage.
>>
>> Hope that helps.
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 3/07/2012, at 3:41 PM, feedly team wrote:
>>
>> Couple more details. I confirmed that swap space is not being used (free
>> -m shows 0 swap) and cassandra.log has a message like "JNA mlockall
>> successful". top shows the process having 9g in resident memory but 21.6g
>> in virtual...What accounts for the much larger virtual number? some kind of
>> off-heap memory?
>>
>> I'm a little puzzled as to why I would get such long pauses without
>> swapping. I uncommented all the gc logging options in cassandra-env.sh to
>> try to see what is going on when the node freezes.
>>
>> Thanks
>> Kireet
>>
>> On Mon, Jul 2, 2012 at 9:51 PM, feedly team  wrote:
>>
>>> Yeah I noticed the leap second problem and ran the suggested fix, but I
>>> have been facing these problems before Saturday and still see the
>>> occasional failures after running the f

high i/o usage on one node

2012-07-16 Thread feedly team
I am having an issue where one node of a 2 node cluster seems to be using
much more I/O than the other node. the cassandra read/write requests seem
to be balanced, but iostat shows the data disk to be maxed at 100%
utilization for one machine and <50% for the other. r/s to be about 3x
greater on the high i/o node. I am using a RF of 2 and consistency mode of
ALL for reads and ONE for writes (current requests are very read heavy).
user CPU seems to be fairly low and the same on both machines, but the high
i/o machine shows an os load of 34 (!) while the other machine reports 7. I
ran a nodetool compactionstats and there are no tasks pending which i
assume means there is no compaction going on, and the logs seem to be ok as
well. the only difference is that on the high i/o node, i am doing full gc
logging, but that's on a separate disk than the data.

Another oddity is that the high i/o node shows a data size of 86GB while
the other shows 71GB. I understand there could be differences, but with a
RF of 2 I would think they would be roughly the equal?

I am using version 1.0.10.


get_slice on wide rows

2012-08-20 Thread feedly team
I have a column family that I am using for consistency purposes. Basically
a marker column is written to a row in this family before some actions take
place and is deleted only after all the actions complete. The idea is that
if something goes horribly wrong this table can be read to see what needs
to be fixed.

In my dev environment things worked as planned, but in a larger scale/high
traffic environment, the slice query times out and then cassandra quickly
runs out of memory. The main difference here is that there is a very large
number of writes (and deleted columns) in the row my code is attempting to
read. Is the problem that cassandra is attempting to load all the deleted
columns into memory? I did an sstableToJson dump and saw that the "d"
deletion marker seemed to be present for the columns, though i didn't write
any code to check all values. Is the solution here partitioning the wide
row into multiple narrower rows?


Re: How to set LeveledCompactionStrategy for an existing table

2012-08-30 Thread feedly team
in cassandra-cli, i did something like:

update column family xyz with
compaction_strategy='LeveledCompactionStrategy'

On Thu, Aug 30, 2012 at 5:20 AM, Jean-Armel Luce  wrote:

>
> Hello,
>
> I am using Cassandra 1.1.1 and CQL3.
> I have a cluster with 1 node (test environment)
> Could you tell how to set the compaction strategy to Leveled Strategy for
> an existing table ?
>
> I have a table pns_credentials
>
> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
> Connected to Test Cluster at localhost:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> use test1;
> cqlsh:test1> describe table pns_credentials;
>
> CREATE TABLE pns_credentials (
>   ise text PRIMARY KEY,
>   isnew int,
>   ts timestamp,
>   mergestatus int,
>   infranetaccount text,
>   user_level int,
>   msisdn bigint,
>   mergeusertype int
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
>
> I want to set the LeveledCompaction strategy for this table, so I execute
> the following ALTER TABLE :
>
> cqlsh:test1> alter table pns_credentials
>  ... WITH compaction_strategy_class='LeveledCompactionStrategy'
>  ... AND compaction_strategy_options:sstable_size_in_mb=10;
>
> In Cassandra logs, I see some informations :
>  INFO 10:23:52,532 Enqueuing flush of
> Memtable-schema_columnfamilies@965212657(1391/1738 serialized/live bytes,
> 20 ops)
>  INFO 10:23:52,533 Writing Memtable-schema_columnfamilies@965212657(1391/1738
> serialized/live bytes, 20 ops)
>  INFO 10:23:52,629 Completed flushing
> /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-94-Data.db
> (1442 bytes) for commitlog position ReplayPosition(segmentId=3556583843054,
> position=1987)
>
>
> However, when I look at the description of the table, the table is still
> with the SizeTieredCompactionStrategy
> cqlsh:test1> describe table pns_credentials ;
>
> CREATE TABLE pns_credentials (
>   ise text PRIMARY KEY,
>   isnew int,
>   ts timestamp,
>   mergestatus int,
>   infranetaccount text,
>   user_level int,
>   msisdn bigint,
>   mergeusertype int
> ) WITH
>   comment='' AND
>   comparator=text AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   default_validation=text AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
>
> In the schema_columnfamilies table (in system keyspace), the table
> pns_credentials is still using the SizeTieredCompactionStrategy
> cqlsh:test1> use system;
> cqlsh:system> select * from schema_columnfamilies ;
> ...
>  test1 |   pns_credentials |   null | KEYS_ONLY
> |[] | |
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
> |  {}
> |
> org.apache.cassandra.db.marshal.UTF8Type |
> {"sstable_compression":"org.apache.cassandra.io.compress.SnappyCompressor"}
> |  org.apache.cassandra.db.marshal.UTF8Type |   864000 |
> 1029 |   ise | org.apache.cassandra.db.marshal.UTF8Type
> |0 |   32
> |4 |0.1 |   True
> |  null | Standard |null
> ...
>
>
> I stopped/started the Cassandra node, but the table is still with
> SizeTieredCompactionStrategy
>
> I tried using cassandra-cli, but the alter is still unsuccessfull.
>
> Is there anything I am missing ?
>
>
> Thanks.
>
> Jean-Armel
>