Re: massive spikes in read latency

2014-01-06 Thread Jason Wee
Hi, could it be due to having noisy neighbour? Do you have graphs
statistics ping between nodes?

Jason


On Mon, Jan 6, 2014 at 7:28 AM, Blake Eggleston  wrote:

> Hi,
>
> I’ve been having a problem with 3 neighboring nodes in our cluster having
> their read latencies jump up to 9000ms - 18000ms for a few minutes (as
> reported by opscenter), then come back down.
>
> We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with
> cassandra reading and writing to 2 raided ssds.
>
> I’ve added 2 nodes to the struggling part of the cluster, and aside from
> the latency spikes shifting onto the new nodes, it has had no effect. I
> suspect that a single key that lives on the first stressed node may be
> being read from heavily.
>
> The spikes in latency don’t seem to be correlated to an increase in reads.
> The cluster’s workload is usually handling a maximum workload of 4200
> reads/sec per node, with writes being significantly less, at ~200/sec per
> node. Usually it will be fine with this, with read latencies at around
> 3.5-10 ms/read, but once or twice an hour the latencies on the 3 nodes will
> shoot through the roof.
>
> The disks aren’t showing serious use, with read and write rates on the ssd
> volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra
> process is maintaining 1000-1100 open connections. GC logs aren’t showing
> any serious gc pauses.
>
> Any ideas on what might be causing this?
>
> Thanks,
>
> Blake


Re: nodetool status owns % calculation after upgrade to 2.0.2

2014-01-06 Thread Mullen, Robert
Oh man, you know what my problem was, I was not specifying the keyspace
after nodetool status. After specifying the keyspace i get the 100%
ownership like I would expect.

nodetool status discsussions
ubuntu@prd-usw2b-pr-01-dscsapi-cadb-0002:~$ nodetool status discussions
Datacenter: us-east-1
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns (effective)  Host ID
  Rack
UN  10.198.4.802.02 MB256 100.0%
 e31aecd5-1eb1-4ddb-85ac-7a4135618b66  use1d
UN  10.198.2.20132.34 MB  256 100.0%
 3253080f-09b6-47a6-9b66-da3d174d1101  use1c
UN  10.198.0.249   1.77 MB256 100.0%
 22b30bea-5643-43b5-8d98-6e0eafe4af75  use1b
Datacenter: us-west-2
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns (effective)  Host ID
  Rack
UN  10.198.20.51   1.2 MB 256 100.0%
 6a40b500-cff4-4513-b26b-ea33048c1590  usw2c
UN  10.198.16.92   1.46 MB256 100.0%
 01989d0b-0f81-411b-a70e-f22f01189542  usw2a
UN  10.198.18.125  2.14 MB256 100.0%
 aa746ed1-288c-414f-8d97-65fc867a5bdd  usw2b


As for the counts being off,rRunning "nodetool repair discussions", which
you're supposed to do after changing replication factor, fixed the fact
that the counts were off, after doing that on the 6 nodes in my cluster,
that one column family is returning a count of 60 on each node.

Thanks for all the help here, I've only been working with cassandra for a
couple of months now and there is a lot to learn.

Thanks,
Rob


On Sun, Jan 5, 2014 at 11:55 PM, Or Sher  wrote:

> RandomPartitioner was the default at  < 1.2.*
> It looks like since 1.2 the default is Murmur3..
> Not sure that's your problem if you say you've upgraded from 1.2.*..
>
>
> On Mon, Jan 6, 2014 at 3:42 AM, Rob Mullen wrote:
>
>> Do you know of the default changed?   I'm pretty sure I never changed
>> that setting the the config file.
>>
>> Sent from my iPhone
>>
>> On Jan 4, 2014, at 11:22 PM, Or Sher  wrote:
>>
>> Robert, is it possible you've changed the partitioner during the upgrade?
>> (e.g. from RandomPartitioner to Murmur3Partitioner ?)
>>
>>
>> On Sat, Jan 4, 2014 at 9:32 PM, Mullen, Robert > > wrote:
>>
>>> The nodetool repair command (which took about 8 hours) seems to have
>>> sync'd the data in us-east, all 3 nodes returning 59 for the count now.
>>>  I'm wondering if this has more to do with changing the replication factor
>>> from 2 to 3 and how 2.0.2 reports the % owned rather than the upgrade
>>> itself.  I still don't understand why it's reporting 16% for each node when
>>> 100% seems to reflect the state of the cluster better.  I didn't find any
>>> info in those issues you posted that would relate to the % changing from
>>> 100% ->16%.
>>>
>>>
>>> On Sat, Jan 4, 2014 at 12:26 PM, Mullen, Robert <
>>> robert.mul...@pearson.com> wrote:
>>>
 from cql
 cqlsh>select count(*) from topics;



 On Sat, Jan 4, 2014 at 12:18 PM, Robert Coli wrote:

> On Sat, Jan 4, 2014 at 11:10 AM, Mullen, Robert <
> robert.mul...@pearson.com> wrote:
>
>> I have a column family called "topics" which has a count of 47 on one
>> node, 59 on another and 49 on another node. It was my understanding with 
>> a
>> replication factor of 3 and 3 nodes in each ring that the nodes should be
>> equal so I could lose a node in the ring and have no loss of data.  Based
>> upon that I would expect the counts across the nodes to all be 59 in this
>> case.
>>
>
> In what specific way are you counting rows?
>
> =Rob
>


>>>
>>
>>
>> --
>> Or Sher
>>
>>
>
>
> --
> Or Sher
>


Re: vnode in production

2014-01-06 Thread Chris Burroughs

On 01/02/2014 01:51 PM, Arindam Barua wrote:

1.   the stability of vnodes in production


I'm happily using vnodes in production now, but I would have trouble 
calling them stable for more than small clusters until very recently 
(1.2.13). CASSANDRA-6127 served as a master ticket for most of the 
issues if you are interested in the details.



2.   upgrading to vnodes in production


I am not aware of anyone who has succeeded with shuffle in production, 
but the 'add a new DC' procedure works.


Re: sstableloader prints nothing

2014-01-06 Thread Andrey Razumovsky
Hi Tyler,

Sorry for late response - I create table using
create table ip_lookup (ip varchar PRIMARY KEY, domains varchar);

Thanks,
Andrey


2013/12/26 Tyler Hobbs 

>
> On Wed, Dec 25, 2013 at 11:29 AM, Andrey Razumovsky <
> razumovsky.and...@gmail.com> wrote:
>
>> OK, I  figured that out - turns out that my sstables were in directory
>>  but not in /. Would be great to
>> have a proper error message here..
>
>
> I've opened a ticket to fix this:
> https://issues.apache.org/jira/browse/CASSANDRA-6529
>
>
>
> However, I still can't import the data. The exception I get on server now
> looks like this:
>  WARN [STREAM-IN-/127.0.1.1] 2013-12-25 18:20:09,686 StreamSession.java
> (line 519) [Stream #4ec06a70-6d6e-11e3-85ae-9b0764b01181] Retrying for
> following error
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:267)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
>  at
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
> at
> org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130)
> at
> org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:255)
> at
> org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:134)
> at
> org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:88)
> at
> org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:55)
> at
> org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:45)
> at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
> at java.lang.Thread.run(Thread.java:724)
>
> What is your schema for this table?
>
> --
> Tyler Hobbs
> DataStax 
>


Re: massive spikes in read latency

2014-01-06 Thread Blake Eggleston
That’s a good point. CPU steal time is very low, but I haven’t observed 
internode ping times during one of the peaks, I’ll have to check that out. 
Another thing I’ve noticed is that cassandra starts dropping read messages 
during the spikes, as reported by tpstats. This indicates that there’s too many 
queries for cassandra to handle. However, as I mentioned earlier, the spikes 
aren’t correlated to an increase in reads.

On Jan 5, 2014, at 3:28 PM, Blake Eggleston  wrote:

> Hi,
> 
> I’ve been having a problem with 3 neighboring nodes in our cluster having 
> their read latencies jump up to 9000ms - 18000ms for a few minutes (as 
> reported by opscenter), then come back down.
> 
> We’re running a 6 node cluster, on AWS hi1.4xlarge instances, with cassandra 
> reading and writing to 2 raided ssds.
> 
> I’ve added 2 nodes to the struggling part of the cluster, and aside from the 
> latency spikes shifting onto the new nodes, it has had no effect. I suspect 
> that a single key that lives on the first stressed node may be being read 
> from heavily.
> 
> The spikes in latency don’t seem to be correlated to an increase in reads. 
> The cluster’s workload is usually handling a maximum workload of 4200 
> reads/sec per node, with writes being significantly less, at ~200/sec per 
> node. Usually it will be fine with this, with read latencies at around 3.5-10 
> ms/read, but once or twice an hour the latencies on the 3 nodes will shoot 
> through the roof. 
> 
> The disks aren’t showing serious use, with read and write rates on the ssd 
> volume at around 1350 kBps and 3218 kBps, respectively. Each cassandra 
> process is maintaining 1000-1100 open connections. GC logs aren’t showing any 
> serious gc pauses.
> 
> Any ideas on what might be causing this?
> 
> Thanks,
> 
> Blake



RE: vnode in production

2014-01-06 Thread Arindam Barua

Thanks for your responses. We are on 1.2.12 currently. 
The fixes in 1.2.13 seem to help for clusters in the 500+ node range (like 
CASSANDRA-6409). Ours is below 50 now, so we plan to go ahead and enable vnodes 
with the 'add a new DC' procedure. We will try to upgrade to 1.2.13 or 1.2.14 
subsequently. 

-Original Message-
From: Chris Burroughs [mailto:chris.burrou...@gmail.com] 
Sent: Monday, January 06, 2014 10:00 AM
To: user@cassandra.apache.org
Subject: Re: vnode in production

On 01/02/2014 01:51 PM, Arindam Barua wrote:
> 1.   the stability of vnodes in production

I'm happily using vnodes in production now, but I would have trouble calling 
them stable for more than small clusters until very recently (1.2.13). 
CASSANDRA-6127 served as a master ticket for most of the issues if you are 
interested in the details.

> 2.   upgrading to vnodes in production

I am not aware of anyone who has succeeded with shuffle in production, but the 
'add a new DC' procedure works.


Re: vnode in production

2014-01-06 Thread Tupshin Harper
This is a generally good interpretation of the state of vnodes with respect
to Cassandra versions 1.2.12 and 1.2.13.

Adding a new datacenter to a 1.2.12 cluster at your scale should be fine. I
consider vnodes fit for production at almost any scale after 1.2.13, or 50
nodes or less (ballpark) for 1.2.12. For reference, I filed the main
tracking issue (CASSANDRA-6127).

-Tupshin


On Mon, Jan 6, 2014 at 1:56 PM, Arindam Barua  wrote:

>
> Thanks for your responses. We are on 1.2.12 currently.
> The fixes in 1.2.13 seem to help for clusters in the 500+ node range (like
> CASSANDRA-6409). Ours is below 50 now, so we plan to go ahead and enable
> vnodes with the 'add a new DC' procedure. We will try to upgrade to 1.2.13
> or 1.2.14 subsequently.
>
> -Original Message-
> From: Chris Burroughs [mailto:chris.burrou...@gmail.com]
> Sent: Monday, January 06, 2014 10:00 AM
> To: user@cassandra.apache.org
> Subject: Re: vnode in production
>
> On 01/02/2014 01:51 PM, Arindam Barua wrote:
> > 1.   the stability of vnodes in production
>
> I'm happily using vnodes in production now, but I would have trouble
> calling them stable for more than small clusters until very recently
> (1.2.13). CASSANDRA-6127 served as a master ticket for most of the issues
> if you are interested in the details.
>
> > 2.   upgrading to vnodes in production
>
> I am not aware of anyone who has succeeded with shuffle in production, but
> the 'add a new DC' procedure works.
>


Re: Cassandra consuming too much memory in ubuntu as compared to within windows, same machine.

2014-01-06 Thread Erik Forkalsud

On 01/04/2014 08:04 AM, Ertio Lew wrote:

...  my dual boot 4GB(RAM) machine.

...  -Xms4G -Xmx4G -



You are allocating all your ram to the java heap.  Are you using the 
same JVM parameters on the windows side?   You can try to lower the heap 
size or add ram to your machine.




- Erik -





Re: vnode in production

2014-01-06 Thread Chris Burroughs

On 01/06/2014 01:56 PM, Arindam Barua wrote:

Thanks for your responses. We are on 1.2.12 currently.
The fixes in 1.2.13 seem to help for clusters in the 500+ node range (like 
CASSANDRA-6409). Ours is below 50 now, so we plan to go ahead and enable vnodes 
with the 'add a new DC' procedure. We will try to upgrade to 1.2.13 or 1.2.14 
subsequently.


Your plan seems reasonable but in the interest of full disclosure 
CASSANDRA-6345 has been observed as a significant issue for clusters in 
the 50-75 node range.