Re: Cassandra Maintenance Best practices

2014-12-09 Thread Jonathan Haddad
I did a presentation on diagnosing performance problems in production at
the US & Euro summits, in which I covered quite a few tools & preventative
measures you should know when running a production cluster.  You may find
it useful:
http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

On ops center - I recommend it.  It gives you a nice dashboard.  I don't
think it's completely comprehensive (but no tool really is) but it gets you
90% of the way there.

It's a good idea to run repairs, especially if you're doing deletes or
querying at CL=ONE.  I assume you're not using quorum, because on RF=2
that's the same as CL=ALL.

I recommend at least RF=3 because if you lose 1 server, you're on the edge
of data loss.


On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi 
wrote:

> Hi,
> We have Two Node Cluster Configuration in production with RF=2.
>
> Which means that the data is written in both the clusters and it's running
> for about a month now and has good amount of data.
>
> Questions?
> 1. What are the best practices for maintenance?
> 2. Is OPScenter required to be installed or I can manage with nodetool
> utility?
> 3. Is is necessary to run repair weekly?
>
> thanks
> regards
> Neha
>


Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread Jonathan Haddad
Yes.  It is, in general, a best practice to upgrade to the latest bug fix
release before doing an upgrade to the next point release.

On Tue Dec 09 2014 at 6:58:24 PM wyang  wrote:

> I looked some upgrade documentations and am a little puzzled.
>
>
> According to
> https://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, “Rolling
> upgrades from anything pre-2.0.7 is not supported”. It means we should
> upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we
> need upgrade stables after that?  It seems nothing specific to note
> upgrading between 2.0.6 and 2.0.7 in NEWS.txt
>
>
> Any advice will be kindly appreciated
>
>
>


[Cassandra][SStableLoader Out of Heap Memory]

2014-12-09 Thread 严超
Hi, Everyone:
I'm importing a CSV file into Cassandra using SStableLoader. And
I'm following the example here:
https://github.com/yukim/cassandra-bulkload-example/
When i try to run the sstableloader, it fails with OOM. I also
changed the sstableloader.sh script (that runs the java -cp ...BulkLoader )
to have more mem using -Xms and -Xmx args but still i keep hitting the same
issue.
Any hints/directions would be really helpful .

*Stack Trace : *
/usr/bin/sstableloader -v -d  /tmp/nitin_test/nitin_test_load/

Established connection to initial hosts
Opening sstables and calculating sections to stream
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.ArrayList.(ArrayList.java:144)
at org.apache.cassandra.db.RowIndexEntry$Serializer.
deserialize(RowIndexEntry.java:120)
at org.apache.cassandra.io.sstable.SSTableReader.buildSummary(SSTableReader.
java:457)
at org.apache.cassandra.io.sstable.SSTableReader.openForBatch(SSTableReader.
java:170)
at org.apache.cassandra.io.sstable.SSTableLoader$1.
accept(SSTableLoader.java:112)
at java.io.File.list(File.java:1155)
at org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.
java:73)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(
SSTableLoader.java:155)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:66)


*Best Regards!*


*Chao Yan--**My twitter:Andy Yan @yanchao727
*


*My Weibo:http://weibo.com/herewearenow
--*


Cassandra Maintenance Best practices

2014-12-09 Thread Neha Trivedi
Hi,
We have Two Node Cluster Configuration in production with RF=2.

Which means that the data is written in both the clusters and it's running
for about a month now and has good amount of data.

Questions?
1. What are the best practices for maintenance?
2. Is OPScenter required to be installed or I can manage with nodetool
utility?
3. Is is necessary to run repair weekly?

thanks
regards
Neha


upgrade cassandra from 2.0.6 to 2.1.2

2014-12-09 Thread wyang
I looked some upgrade documentations and am a little puzzled.


According tohttps://github.com/apache/cassandra/blob/cassandra-2.1/NEWS.txt, 
“Rolling upgrades from anything pre-2.0.7 is not supported”. It means we should 
upgrade to 2.0.7 or later first? Can we rolling upgrade to 2.0.7? Do we need 
upgradestables after that? It seems nothing specific to note upgrading between 
2.0.6 and 2.0.7 in NEWS.txt


Any advice will be kindly appreciated

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Rob.  Definitely good advice that I wish I had come across a couple
of months ago...  That said, it still definitely points me in the right
direction as to what to do now.

--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 12:21 PM, Robert Coli  wrote:

> On Mon, Dec 8, 2014 at 5:12 PM, Nate Yoder  wrote:
>
>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>> C3.2XLarge nodes which overall is working very well for us.  However, after
>> letting it run for a while I seem to get into a situation where the amount
>> of disk space used far exceeds the total amount of data on each node and I
>> haven't been able to get the size to go back down except by stopping and
>> restarting the node.
>>
>
>
>> [... link to rather serious bug in 2.1.1 version in JIRA ...]
>>
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
> =Rob
>


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Robert Coli
On Mon, Dec 8, 2014 at 5:12 PM, Nate Yoder  wrote:

> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
> C3.2XLarge nodes which overall is working very well for us.  However, after
> letting it run for a while I seem to get into a situation where the amount
> of disk space used far exceeds the total amount of data on each node and I
> haven't been able to get the size to go back down except by stopping and
> restarting the node.
>


> [... link to rather serious bug in 2.1.1 version in JIRA ...]
>

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

=Rob


Best practice for emulating a Cassandra timeout during unit tests?

2014-12-09 Thread Clint Kelly
Hi all,

I'd like to write some tests for my code that uses the Cassandra Java
driver to see how it behaves if there is a read timeout while accessing
Cassandra.  Is there a best-practice for getting this done?  I was thinking
about adjusting the settings in the cluster builder to adjust the timeout
settings to be something impossibly low (like 1ms), but I'd rather do
something to my test Cassandra instance (using the
EmbeddedCassandraService) to temporarily slow it down.

Any suggestions?

Best regards,
Clint


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi All,

Thanks for the help but after yet another day of investigation I think I
might be running into this
https://issues.apache.org/jira/browse/CASSANDRA-8061 issue where tmplink
files aren't removed until Cassandra is restarted.

Thanks again for all the suggestions!

Nate

--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 10:18 AM, Nate Yoder  wrote:

> Hi Reynald,
>
> Good idea but I have incremental backups turned off and other than *.db
> files nothing else appears to be in the data directory for that table.
>
> Is there any other output that would be helpful in helping you all help me?
>
> Thanks,
> Nate
>
> --
> *Nathanael Yoder*
> Principal Engineer & Data Scientist, Whistle
> 415-944-7344 // n...@whistle.com
>
> On Tue, Dec 9, 2014 at 9:27 AM, Reynald Bourtembourg <
> reynald.bourtembo...@esrf.fr> wrote:
>
>>  Hi Nate,
>>
>> Are you using incremental backups?
>>
>> Extract from the documentation (
>> http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html
>> ):
>>
>> *When incremental backups are enabled (disabled by default), Cassandra
>> hard-links each flushed SSTable to a backups directory under the keyspace
>> data directory. This allows storing backups offsite without transferring
>> entire snapshots. Also, incremental backups combine with snapshots to
>> provide a dependable, up-to-date backup mechanism.*
>>
>> *As with snapshots, Cassandra does not automatically clear incremental
>> backup files. DataStax recommends setting up a process to clear incremental
>> backup hard-links each time a new snapshot is created.*
>>  These backups are stored in directories named "backups" at the same
>> level as the "snapshots' directories.
>>
>> Reynald
>>
>>
>> On 09/12/2014 18:13, Nate Yoder wrote:
>>
>> Thanks for the advice.  Totally makes sense.  Once I figure out how to
>> make my data stop taking up more than 2x more space without being useful
>> I'll definitely make the change :)
>>
>>  Nate
>>
>>
>>
>>   --
>> *Nathanael Yoder*
>> Principal Engineer & Data Scientist, Whistle
>> 415-944-7344 // n...@whistle.com
>>
>> On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad 
>> wrote:
>>
>>> Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
>>> and a node goes down, you're going to have a bad time. (downtime) If you're
>>> using CL=ONE then you'd be ok.  However, I am not wild about losing a node
>>> and having only 1 copy of my data available in prod.
>>>
>>>
>>> On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder  wrote:
>>>
 Thanks Jonathan.  So there is nothing too idiotic about my current
 set-up with 6 boxes each with 256 vnodes each and a RF of 2?

  I appreciate the help,
 Nate



   --
 *Nathanael Yoder*
 Principal Engineer & Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

  On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad 
 wrote:

> You don't need a prime number of nodes in your ring, but it's not a
> bad idea to it be a multiple of your RF when your cluster is small.
>
>
> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder  wrote:
>
>> Hi Ian,
>>
>>  Thanks for the suggestion but I had actually already done that
>> prior to the scenario I described (to get myself some free space) and 
>> when
>> I ran nodetool cfstats it listed 0 snapshots as expected, so 
>> unfortunately
>> I don't think that is where my space went.
>>
>>  One additional piece of information I forgot to point out is that
>> when I ran nodetool status on the node it included all 6 nodes.
>>
>>  I have also heard it mentioned that I may want to have a prime
>> number of nodes which may help protect against split-brain.  Is this 
>> true?
>> If so does it still apply when I am using vnodes?
>>
>>  Thanks again,
>> Nate
>>
>>   --
>> *Nathanael Yoder*
>> Principal Engineer & Data Scientist, Whistle
>> 415-944-7344 // n...@whistle.com
>>
>> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose 
>> wrote:
>>
>>> Try `nodetool clearsnapshot` which will delete any snapshots you
>>> have.  I have never taken a snapshot with nodetool yet I found several
>>> snapshots on my disk recently (which can take a lot of space).  So 
>>> perhaps
>>> they are automatically generated by some operation?  No idea.  
>>> Regardless,
>>> nuking those freed up a ton of space for me.
>>>
>>>  - Ian
>>>
>>>
>>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:
>>>
 Hi All,

  I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

  I am currently running a 6 node Cassandra 2.1.1 cluster on EC2
 using C3.2XLarge nodes which overall is working very well for us.

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Reynald,

Good idea but I have incremental backups turned off and other than *.db
files nothing else appears to be in the data directory for that table.

Is there any other output that would be helpful in helping you all help me?

Thanks,
Nate

--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 9:27 AM, Reynald Bourtembourg <
reynald.bourtembo...@esrf.fr> wrote:

>  Hi Nate,
>
> Are you using incremental backups?
>
> Extract from the documentation (
> http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html
> ):
>
> *When incremental backups are enabled (disabled by default), Cassandra
> hard-links each flushed SSTable to a backups directory under the keyspace
> data directory. This allows storing backups offsite without transferring
> entire snapshots. Also, incremental backups combine with snapshots to
> provide a dependable, up-to-date backup mechanism.*
>
> *As with snapshots, Cassandra does not automatically clear incremental
> backup files. DataStax recommends setting up a process to clear incremental
> backup hard-links each time a new snapshot is created.*
>  These backups are stored in directories named "backups" at the same level
> as the "snapshots' directories.
>
> Reynald
>
>
> On 09/12/2014 18:13, Nate Yoder wrote:
>
> Thanks for the advice.  Totally makes sense.  Once I figure out how to
> make my data stop taking up more than 2x more space without being useful
> I'll definitely make the change :)
>
>  Nate
>
>
>
>   --
> *Nathanael Yoder*
> Principal Engineer & Data Scientist, Whistle
> 415-944-7344 // n...@whistle.com
>
> On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad  wrote:
>
>> Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
>> and a node goes down, you're going to have a bad time. (downtime) If you're
>> using CL=ONE then you'd be ok.  However, I am not wild about losing a node
>> and having only 1 copy of my data available in prod.
>>
>>
>> On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder  wrote:
>>
>>> Thanks Jonathan.  So there is nothing too idiotic about my current
>>> set-up with 6 boxes each with 256 vnodes each and a RF of 2?
>>>
>>>  I appreciate the help,
>>> Nate
>>>
>>>
>>>
>>>   --
>>> *Nathanael Yoder*
>>> Principal Engineer & Data Scientist, Whistle
>>> 415-944-7344 // n...@whistle.com
>>>
>>>  On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad 
>>> wrote:
>>>
 You don't need a prime number of nodes in your ring, but it's not a bad
 idea to it be a multiple of your RF when your cluster is small.


 On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder  wrote:

> Hi Ian,
>
>  Thanks for the suggestion but I had actually already done that prior
> to the scenario I described (to get myself some free space) and when I ran
> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
> don't think that is where my space went.
>
>  One additional piece of information I forgot to point out is that
> when I ran nodetool status on the node it included all 6 nodes.
>
>  I have also heard it mentioned that I may want to have a prime
> number of nodes which may help protect against split-brain.  Is this true?
> If so does it still apply when I am using vnodes?
>
>  Thanks again,
> Nate
>
>   --
> *Nathanael Yoder*
> Principal Engineer & Data Scientist, Whistle
> 415-944-7344 // n...@whistle.com
>
> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose 
> wrote:
>
>> Try `nodetool clearsnapshot` which will delete any snapshots you
>> have.  I have never taken a snapshot with nodetool yet I found several
>> snapshots on my disk recently (which can take a lot of space).  So 
>> perhaps
>> they are automatically generated by some operation?  No idea.  
>> Regardless,
>> nuking those freed up a ton of space for me.
>>
>>  - Ian
>>
>>
>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:
>>
>>> Hi All,
>>>
>>>  I am new to Cassandra so I apologise in advance if I have missed
>>> anything obvious but this one currently has me stumped.
>>>
>>>  I am currently running a 6 node Cassandra 2.1.1 cluster on EC2
>>> using C3.2XLarge nodes which overall is working very well for us.  
>>> However,
>>> after letting it run for a while I seem to get into a situation where 
>>> the
>>> amount of disk space used far exceeds the total amount of data on each 
>>> node
>>> and I haven't been able to get the size to go back down except by 
>>> stopping
>>> and restarting the node.
>>>
>>>  For example, in my data I have almost all of my data in one
>>> table.  On one of my nodes right now the total space used (as reported 
>>> by
>>> nodetool cfstats) is 57.2 GB and there are no snapshots. However, when I
>>> look at the size o

Observations/concerns with repair and hinted handoff

2014-12-09 Thread Robert Wille
I have spent a lot of time working with single-node, RF=1 clusters in my 
development. Before I deploy a cluster to our live environment, I have spent 
some time learning how to work with a multi-node cluster with RF=3. There were 
some surprises. I’m wondering if people here can enlighten me. I don’t exactly 
have that warm, fuzzy feeling.

I created a three-node cluster with RF=3. I then wrote to the cluster pretty 
heavily to cause some dropped mutation messages. The dropped messages didn’t 
trickle in, but came in a burst. I suspect full GC is the culprit, but I don’t 
really know. Anyway, I ended up with 17197 dropped mutation messages on node 1, 
6422 on node 2, and none on node 3. In order to learn about repair, I waited 
for compaction to finish doing its thing, recorded the size and estimated 
number of keys for each table, started up repair (nodetool repair ) 
on all three nodes, and waited for it to complete before doing anything else 
(even reads). When repair and compaction were done, I checked the size and 
estimated number of keys for each table. All tables on all nodes grew in size 
and estimated number of keys. The estimated number of keys for each node grew 
by 65k, 272k and 247k (.2%, .7% and .6%) for nodes 1, 2 and 3 respectively. I 
expected some growth, but that’s significantly more new keys than I had dropped 
mutation messages. I also expected the most new data on node 1, and none on 
node 3, which didn’t come close to what actually happened. Perhaps a mutation 
message contains more than one record? Perhaps the dropped mutation message 
counter is incremented on the coordinator, not the node that was overloaded?

I repeated repair, and the second time around the tables remained unchanged, as 
expected. I would hope that repair wouldn’t do anything to the tables if they 
were in sync. 

Just to be clear, I’m not overly concerned about the unexpected increase in 
number of keys. I’m pretty sure that repair did the needful thing and did bring 
the nodes in sync. The unexpected results more likely indicates that I’m 
ignorant, and it really bothers me when I don’t understand something. If you 
have any insights, I’d appreciate them.

One of the dismaying things about repair was that the first time around it took 
about 4 hours, with a completely idle cluster (except for repairs, of course), 
and only 6 GB of data on each node. I can bootstrap a node with 6 GB of data in 
a couple of minutes. That makes repair something like 50 to 100 times more 
expensive than bootstrapping. I know I should run repair on one node at a time, 
but even if you divide by three, that’s still a horrifically long time for such 
a small amount of data. The second time around, repair only took 30 minutes. 
That’s much better, but best-case is still about 10x longer than bootstrapping. 
Should repair really be taking this long? When I have 300 GB of data, is a 
best-case repair going to take 25 hours, and a repair with a modest amount of 
work more than 100 hours? My records are quite small. Those 6 GB contain almost 
40 million partitions. 

Following my repair experiment, I added a fourth node, and then tried killing a 
node and importing a bunch of data while the node was down. As far as repair is 
concerned, this seems to work fine (although again, glacially). However, I 
noticed that hinted handoff doesn’t seem to be working. I added several million 
records (with consistency=one), and nothing appeared in system.hints (du -hs 
showed a few dozen K bytes), nor did I get any pending Hinted Handoff tasks in 
the Thread Pool Stats. When I started up the down node (less than 3 hours 
later), the missed data didn’t appear to get sent to it. The tables did not 
grow, compaction events didn’t schedule, and there wasn’t any appreciable CPU 
utilization by the cluster. With millions of records that were missed while it 
was down, I should have noticed something if it actually was replaying the 
hints. Is there some magic setting to turn on hinted handoffs? Were there too 
many hints and so it just deleted them? My assumption is that if hinted handoff 
is working, then my need for repair should be much less, which given my 
experience so far, would be a really good thing.

Given the horrifically long time it takes to repair a node, and hinted handoff 
apparently not working, if a node goes down, is it better to bootstrap a new 
one than to repair the node that went down? I would expect that even if I chose 
to bootstrap a new node, it would need to be repaired anyway, since it would 
probably miss writes while bootstrapping.

Thanks in advance

Robert



Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Reynald Bourtembourg

Hi Nate,

Are you using incremental backups?

Extract from the documentation ( 
http://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_backup_incremental_t.html 
):


/When incremental backups are enabled (disabled by default), Cassandra 
hard-links each flushed SSTable to a backups directory under the 
keyspace data directory. This allows storing backups offsite without 
transferring entire snapshots. Also, incremental backups combine with 
snapshots to provide a dependable, up-to-date backup mechanism./

//

/As with snapshots, Cassandra does not automatically clear incremental 
backup files. *DataStax recommends setting up a process to clear 
incremental backup hard-links each time a new snapshot is created.*/


These backups are stored in directories named "backups" at the same 
level as the "snapshots' directories.


Reynald

On 09/12/2014 18:13, Nate Yoder wrote:
Thanks for the advice.  Totally makes sense.  Once I figure out how to 
make my data stop taking up more than 2x more space without being 
useful I'll definitely make the change :)


Nate



--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com 

On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad > wrote:


Well, I personally don't like RF=2.  It means if you're using
CL=QUORUM and a node goes down, you're going to have a bad time.
(downtime) If you're using CL=ONE then you'd be ok. However, I am
not wild about losing a node and having only 1 copy of my data
available in prod.


On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder mailto:n...@whistle.com>> wrote:

Thanks Jonathan.  So there is nothing too idiotic about my
current set-up with 6 boxes each with 256 vnodes each and a RF
of 2?

I appreciate the help,
Nate



--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com 

On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad
mailto:j...@jonhaddad.com>> wrote:

You don't need a prime number of nodes in your ring, but
it's not a bad idea to it be a multiple of your RF when
your cluster is small.


On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder
mailto:n...@whistle.com>> wrote:

Hi Ian,

Thanks for the suggestion but I had actually already
done that prior to the scenario I described (to get
myself some free space) and when I ran nodetool
cfstats it listed 0 snapshots as expected, so
unfortunately I don't think that is where my space went.

One additional piece of information I forgot to point
out is that when I ran nodetool status on the node it
included all 6 nodes.

I have also heard it mentioned that I may want to have
a prime number of nodes which may help protect against
split-brain.  Is this true?  If so does it still apply
when I am using vnodes?

Thanks again,
Nate

--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com 

On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose
mailto:ianr...@fullstory.com>>
wrote:

Try `nodetool clearsnapshot` which will delete any
snapshots you have.  I have never taken a snapshot
with nodetool yet I found several snapshots on my
disk recently (which can take a lot of space).  So
perhaps they are automatically generated by some
operation? No idea.  Regardless, nuking those
freed up a ton of space for me.

- Ian


On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder
mailto:n...@whistle.com>> wrote:

Hi All,

I am new to Cassandra so I apologise in
advance if I have missed anything obvious but
this one currently has me stumped.

I am currently running a 6 node Cassandra
2.1.1 cluster on EC2 using C3.2XLarge nodes
which overall is working very well for us.
However, after letting it run for a while I
seem to get into a situation where the amount
of disk space used far exceeds the total
amount of data on each node and I haven't been
able to get the size to go back down except by

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks for the advice.  Totally makes sense.  Once I figure out how to make
my data stop taking up more than 2x more space without being useful I'll
definitely make the change :)

Nate



--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 9:02 AM, Jonathan Haddad  wrote:

> Well, I personally don't like RF=2.  It means if you're using CL=QUORUM
> and a node goes down, you're going to have a bad time. (downtime) If you're
> using CL=ONE then you'd be ok.  However, I am not wild about losing a node
> and having only 1 copy of my data available in prod.
>
>
> On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder  wrote:
>
>> Thanks Jonathan.  So there is nothing too idiotic about my current set-up
>> with 6 boxes each with 256 vnodes each and a RF of 2?
>>
>> I appreciate the help,
>> Nate
>>
>>
>>
>> --
>> *Nathanael Yoder*
>> Principal Engineer & Data Scientist, Whistle
>> 415-944-7344 // n...@whistle.com
>>
>> On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad 
>> wrote:
>>
>>> You don't need a prime number of nodes in your ring, but it's not a bad
>>> idea to it be a multiple of your RF when your cluster is small.
>>>
>>>
>>> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder  wrote:
>>>
 Hi Ian,

 Thanks for the suggestion but I had actually already done that prior to
 the scenario I described (to get myself some free space) and when I ran
 nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
 don't think that is where my space went.

 One additional piece of information I forgot to point out is that when
 I ran nodetool status on the node it included all 6 nodes.

 I have also heard it mentioned that I may want to have a prime number
 of nodes which may help protect against split-brain.  Is this true?  If so
 does it still apply when I am using vnodes?

 Thanks again,
 Nate

 --
 *Nathanael Yoder*
 Principal Engineer & Data Scientist, Whistle
 415-944-7344 // n...@whistle.com

 On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose  wrote:

> Try `nodetool clearsnapshot` which will delete any snapshots you
> have.  I have never taken a snapshot with nodetool yet I found several
> snapshots on my disk recently (which can take a lot of space).  So perhaps
> they are automatically generated by some operation?  No idea.  Regardless,
> nuking those freed up a ton of space for me.
>
> - Ian
>
>
> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:
>
>> Hi All,
>>
>> I am new to Cassandra so I apologise in advance if I have missed
>> anything obvious but this one currently has me stumped.
>>
>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>> C3.2XLarge nodes which overall is working very well for us.  However, 
>> after
>> letting it run for a while I seem to get into a situation where the 
>> amount
>> of disk space used far exceeds the total amount of data on each node and 
>> I
>> haven't been able to get the size to go back down except by stopping and
>> restarting the node.
>>
>> For example, in my data I have almost all of my data in one table.
>> On one of my nodes right now the total space used (as reported by 
>> nodetool
>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at 
>> the
>> size of the data files (using du) the data file for that table is 107GB.
>> Because the C3.2XLarge only have 160 GB of SSD you can see why this 
>> quickly
>> becomes a problem.
>>
>> Running nodetool compact didn't reduce the size and neither does
>> running nodetool repair -pr on the node.  I also tried nodetool flush and
>> nodetool cleanup (even though I have not added or removed any nodes
>> recently) but it didn't change anything either.  In order to keep my
>> cluster up I then stopped and started that node and the size of the data
>> file dropped to 54GB while the total column family size (as reported by
>> nodetool) stayed about the same.
>>
>> Any suggestions as to what I could be doing wrong?
>>
>> Thanks,
>> Nate
>>
>
>

>>


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
Well, I personally don't like RF=2.  It means if you're using CL=QUORUM and
a node goes down, you're going to have a bad time. (downtime) If you're
using CL=ONE then you'd be ok.  However, I am not wild about losing a node
and having only 1 copy of my data available in prod.

On Tue Dec 09 2014 at 8:40:37 AM Nate Yoder  wrote:

> Thanks Jonathan.  So there is nothing too idiotic about my current set-up
> with 6 boxes each with 256 vnodes each and a RF of 2?
>
> I appreciate the help,
> Nate
>
>
>
> --
> *Nathanael Yoder*
> Principal Engineer & Data Scientist, Whistle
> 415-944-7344 // n...@whistle.com
>
> On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad  wrote:
>
>> You don't need a prime number of nodes in your ring, but it's not a bad
>> idea to it be a multiple of your RF when your cluster is small.
>>
>>
>> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder  wrote:
>>
>>> Hi Ian,
>>>
>>> Thanks for the suggestion but I had actually already done that prior to
>>> the scenario I described (to get myself some free space) and when I ran
>>> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
>>> don't think that is where my space went.
>>>
>>> One additional piece of information I forgot to point out is that when I
>>> ran nodetool status on the node it included all 6 nodes.
>>>
>>> I have also heard it mentioned that I may want to have a prime number of
>>> nodes which may help protect against split-brain.  Is this true?  If so
>>> does it still apply when I am using vnodes?
>>>
>>> Thanks again,
>>> Nate
>>>
>>> --
>>> *Nathanael Yoder*
>>> Principal Engineer & Data Scientist, Whistle
>>> 415-944-7344 // n...@whistle.com
>>>
>>> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose  wrote:
>>>
 Try `nodetool clearsnapshot` which will delete any snapshots you have.
 I have never taken a snapshot with nodetool yet I found several snapshots
 on my disk recently (which can take a lot of space).  So perhaps they are
 automatically generated by some operation?  No idea.  Regardless, nuking
 those freed up a ton of space for me.

 - Ian


 On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:

> Hi All,
>
> I am new to Cassandra so I apologise in advance if I have missed
> anything obvious but this one currently has me stumped.
>
> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
> C3.2XLarge nodes which overall is working very well for us.  However, 
> after
> letting it run for a while I seem to get into a situation where the amount
> of disk space used far exceeds the total amount of data on each node and I
> haven't been able to get the size to go back down except by stopping and
> restarting the node.
>
> For example, in my data I have almost all of my data in one table.  On
> one of my nodes right now the total space used (as reported by nodetool
> cfstats) is 57.2 GB and there are no snapshots. However, when I look at 
> the
> size of the data files (using du) the data file for that table is 107GB.
> Because the C3.2XLarge only have 160 GB of SSD you can see why this 
> quickly
> becomes a problem.
>
> Running nodetool compact didn't reduce the size and neither does
> running nodetool repair -pr on the node.  I also tried nodetool flush and
> nodetool cleanup (even though I have not added or removed any nodes
> recently) but it didn't change anything either.  In order to keep my
> cluster up I then stopped and started that node and the size of the data
> file dropped to 54GB while the total column family size (as reported by
> nodetool) stayed about the same.
>
> Any suggestions as to what I could be doing wrong?
>
> Thanks,
> Nate
>


>>>
>


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Thanks Jonathan.  So there is nothing too idiotic about my current set-up
with 6 boxes each with 256 vnodes each and a RF of 2?

I appreciate the help,
Nate



--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 8:31 AM, Jonathan Haddad  wrote:

> You don't need a prime number of nodes in your ring, but it's not a bad
> idea to it be a multiple of your RF when your cluster is small.
>
>
> On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder  wrote:
>
>> Hi Ian,
>>
>> Thanks for the suggestion but I had actually already done that prior to
>> the scenario I described (to get myself some free space) and when I ran
>> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
>> don't think that is where my space went.
>>
>> One additional piece of information I forgot to point out is that when I
>> ran nodetool status on the node it included all 6 nodes.
>>
>> I have also heard it mentioned that I may want to have a prime number of
>> nodes which may help protect against split-brain.  Is this true?  If so
>> does it still apply when I am using vnodes?
>>
>> Thanks again,
>> Nate
>>
>> --
>> *Nathanael Yoder*
>> Principal Engineer & Data Scientist, Whistle
>> 415-944-7344 // n...@whistle.com
>>
>> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose  wrote:
>>
>>> Try `nodetool clearsnapshot` which will delete any snapshots you have.
>>> I have never taken a snapshot with nodetool yet I found several snapshots
>>> on my disk recently (which can take a lot of space).  So perhaps they are
>>> automatically generated by some operation?  No idea.  Regardless, nuking
>>> those freed up a ton of space for me.
>>>
>>> - Ian
>>>
>>>
>>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:
>>>
 Hi All,

 I am new to Cassandra so I apologise in advance if I have missed
 anything obvious but this one currently has me stumped.

 I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
 C3.2XLarge nodes which overall is working very well for us.  However, after
 letting it run for a while I seem to get into a situation where the amount
 of disk space used far exceeds the total amount of data on each node and I
 haven't been able to get the size to go back down except by stopping and
 restarting the node.

 For example, in my data I have almost all of my data in one table.  On
 one of my nodes right now the total space used (as reported by nodetool
 cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
 size of the data files (using du) the data file for that table is 107GB.
 Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
 becomes a problem.

 Running nodetool compact didn't reduce the size and neither does
 running nodetool repair -pr on the node.  I also tried nodetool flush and
 nodetool cleanup (even though I have not added or removed any nodes
 recently) but it didn't change anything either.  In order to keep my
 cluster up I then stopped and started that node and the size of the data
 file dropped to 54GB while the total column family size (as reported by
 nodetool) stayed about the same.

 Any suggestions as to what I could be doing wrong?

 Thanks,
 Nate

>>>
>>>
>>


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Jonathan Haddad
You don't need a prime number of nodes in your ring, but it's not a bad
idea to it be a multiple of your RF when your cluster is small.


On Tue Dec 09 2014 at 8:29:35 AM Nate Yoder  wrote:

> Hi Ian,
>
> Thanks for the suggestion but I had actually already done that prior to
> the scenario I described (to get myself some free space) and when I ran
> nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
> don't think that is where my space went.
>
> One additional piece of information I forgot to point out is that when I
> ran nodetool status on the node it included all 6 nodes.
>
> I have also heard it mentioned that I may want to have a prime number of
> nodes which may help protect against split-brain.  Is this true?  If so
> does it still apply when I am using vnodes?
>
> Thanks again,
> Nate
>
> --
> *Nathanael Yoder*
> Principal Engineer & Data Scientist, Whistle
> 415-944-7344 // n...@whistle.com
>
> On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose  wrote:
>
>> Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
>> have never taken a snapshot with nodetool yet I found several snapshots on
>> my disk recently (which can take a lot of space).  So perhaps they are
>> automatically generated by some operation?  No idea.  Regardless, nuking
>> those freed up a ton of space for me.
>>
>> - Ian
>>
>>
>> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:
>>
>>> Hi All,
>>>
>>> I am new to Cassandra so I apologise in advance if I have missed
>>> anything obvious but this one currently has me stumped.
>>>
>>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>>> C3.2XLarge nodes which overall is working very well for us.  However, after
>>> letting it run for a while I seem to get into a situation where the amount
>>> of disk space used far exceeds the total amount of data on each node and I
>>> haven't been able to get the size to go back down except by stopping and
>>> restarting the node.
>>>
>>> For example, in my data I have almost all of my data in one table.  On
>>> one of my nodes right now the total space used (as reported by nodetool
>>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
>>> size of the data files (using du) the data file for that table is 107GB.
>>> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
>>> becomes a problem.
>>>
>>> Running nodetool compact didn't reduce the size and neither does running
>>> nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
>>> cleanup (even though I have not added or removed any nodes recently) but it
>>> didn't change anything either.  In order to keep my cluster up I then
>>> stopped and started that node and the size of the data file dropped to 54GB
>>> while the total column family size (as reported by nodetool) stayed about
>>> the same.
>>>
>>> Any suggestions as to what I could be doing wrong?
>>>
>>> Thanks,
>>> Nate
>>>
>>
>>
>


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Nate Yoder
Hi Ian,

Thanks for the suggestion but I had actually already done that prior to the
scenario I described (to get myself some free space) and when I ran
nodetool cfstats it listed 0 snapshots as expected, so unfortunately I
don't think that is where my space went.

One additional piece of information I forgot to point out is that when I
ran nodetool status on the node it included all 6 nodes.

I have also heard it mentioned that I may want to have a prime number of
nodes which may help protect against split-brain.  Is this true?  If so
does it still apply when I am using vnodes?

Thanks again,
Nate

--
*Nathanael Yoder*
Principal Engineer & Data Scientist, Whistle
415-944-7344 // n...@whistle.com

On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose  wrote:

> Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
> have never taken a snapshot with nodetool yet I found several snapshots on
> my disk recently (which can take a lot of space).  So perhaps they are
> automatically generated by some operation?  No idea.  Regardless, nuking
> those freed up a ton of space for me.
>
> - Ian
>
>
> On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:
>
>> Hi All,
>>
>> I am new to Cassandra so I apologise in advance if I have missed anything
>> obvious but this one currently has me stumped.
>>
>> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
>> C3.2XLarge nodes which overall is working very well for us.  However, after
>> letting it run for a while I seem to get into a situation where the amount
>> of disk space used far exceeds the total amount of data on each node and I
>> haven't been able to get the size to go back down except by stopping and
>> restarting the node.
>>
>> For example, in my data I have almost all of my data in one table.  On
>> one of my nodes right now the total space used (as reported by nodetool
>> cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
>> size of the data files (using du) the data file for that table is 107GB.
>> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
>> becomes a problem.
>>
>> Running nodetool compact didn't reduce the size and neither does running
>> nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
>> cleanup (even though I have not added or removed any nodes recently) but it
>> didn't change anything either.  In order to keep my cluster up I then
>> stopped and started that node and the size of the data file dropped to 54GB
>> while the total column family size (as reported by nodetool) stayed about
>> the same.
>>
>> Any suggestions as to what I could be doing wrong?
>>
>> Thanks,
>> Nate
>>
>
>


Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Ian Rose
Try `nodetool clearsnapshot` which will delete any snapshots you have.  I
have never taken a snapshot with nodetool yet I found several snapshots on
my disk recently (which can take a lot of space).  So perhaps they are
automatically generated by some operation?  No idea.  Regardless, nuking
those freed up a ton of space for me.

- Ian


On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder  wrote:

> Hi All,
>
> I am new to Cassandra so I apologise in advance if I have missed anything
> obvious but this one currently has me stumped.
>
> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
> C3.2XLarge nodes which overall is working very well for us.  However, after
> letting it run for a while I seem to get into a situation where the amount
> of disk space used far exceeds the total amount of data on each node and I
> haven't been able to get the size to go back down except by stopping and
> restarting the node.
>
> For example, in my data I have almost all of my data in one table.  On one
> of my nodes right now the total space used (as reported by nodetool
> cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
> size of the data files (using du) the data file for that table is 107GB.
> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
> becomes a problem.
>
> Running nodetool compact didn't reduce the size and neither does running
> nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
> cleanup (even though I have not added or removed any nodes recently) but it
> didn't change anything either.  In order to keep my cluster up I then
> stopped and started that node and the size of the data file dropped to 54GB
> while the total column family size (as reported by nodetool) stayed about
> the same.
>
> Any suggestions as to what I could be doing wrong?
>
> Thanks,
> Nate
>


Re: How to model data to achieve specific data locality

2014-12-09 Thread Kai Wang
Some of the sequences grow so fast that sub-partition is inevitable. I may
need to try different bucket sizes to get the optimal throughput. Thank you
all for the advice.

On Mon, Dec 8, 2014 at 9:55 AM, Eric Stevens  wrote:

> The upper bound for the data size of a single column is 2GB, and the upper
> bound for the number of columns in a row (partition) is 2 billion.  So if
> you wanted to create the largest possible row, you probably can't afford
> enough disks to hold it.
> http://wiki.apache.org/cassandra/CassandraLimitations
>
> Practically speaking you start running into troubles *way* before you
> reach those thresholds though.  Large columns and large numbers of columns
> create GC pressure in your cluster, and since all data for a given row
> reside on the same primary and replicas, this tends to lead to hot
> spotting.  Repair happens for entire rows, so large rows increase the cost
> of repairs, including GC pressure during the repair.  And rows of this size
> are often arrived at by appending to the same row repeatedly, which will
> cause the data for that row to be scattered across a large number of
> SSTables which will hurt read performance. Also depending on your
> interface, you'll find you start hitting limits that you have to increase,
> each with their own implications (eg, maximum thrift message sizes and so
> forth).  The right maximum practical size for a row definitely depends on
> your read and write patterns, as well as your hardware and network.  More
> memory, SSD's, larger SSTables, and faster networks will all raise the
> ceiling for where large rows start to become painful.
>
> @Kai, if you're familiar with the Thrift paradigm, the partition key
> equates to a Thrift row key, and the clustering key equates to the first
> part of a composite column name.  CQL PRIMARY KEY ((a,b), c, d) equates to
> Thrift where row key is ['a:b'] and all columns begin with ['c:d:'].
> Recommended reading: http://www.datastax.com/dev/blog/thrift-to-cql3
>
> Whatever your partition key, if you need to sub-partition to maintain
> reasonable row sizes, then the only way to preserve data locality for
> related records is probably to switch to byte ordered partitioner, and
> compute blob or long column as part of your partition key that is meant to
> cause the PK to to map to the same token.  Just be aware that byte ordered
> partitioner comes with a number of caveats, and you'll become responsible
> for maintaining good data load distributions in your cluster. But the
> benefits from being able to tune locality may be worth it.
>
>
> On Sun Dec 07 2014 at 3:12:11 PM Jonathan Haddad 
> wrote:
>
>> I think he mentioned 100MB as the max size - planning for 1mb might make
>> your data model difficult to work.
>>
>> On Sun Dec 07 2014 at 12:07:47 PM Kai Wang  wrote:
>>
>>> Thanks for the help. I wasn't clear how clustering column works. Coming
>>> from Thrift experience, it took me a while to understand how clustering
>>> column impacts partition storage on disk. Now I believe using seq_type as
>>> the first clustering column solves my problem. As of partition size, I will
>>> start with some bucket assumption. If the partition size exceeds the
>>> threshold I may need to re-bucket using smaller bucket size.
>>>
>>> On another thread Eric mentions the optimal partition size should be at
>>> 100 kb ~ 1 MB. I will use that as the start point to design my bucket
>>> strategy.
>>>
>>>
>>> On Sun, Dec 7, 2014 at 10:32 AM, Jack Krupansky >> > wrote:
>>>
   It would be helpful to look at some specific examples of sequences,
 showing how they grow. I suspect that the term “sequence” is being
 overloaded in some subtly misleading way here.

 Besides, we’ve already answered the headline question – data locality
 is achieved by having a common partition key. So, we need some clarity as
 to what question we are really focusing on

 And, of course, we should be asking the “Cassandra Data Modeling 101”
 question of what do your queries want to look like, how exactly do you want
 to access your data. Only after we have a handle on how you need to read
 your data can we decide how it should be stored.

 My immediate question to get things back on track: When you say “The
 typical read is to load a subset of sequences with the same seq_id”,
 what type of “subset” are you talking about? Again, a few explicit and
 concise example queries (in some concise, easy to read pseudo language or
 even plain English, but not belabored with full CQL syntax.) would be very
 helpful. I mean, Cassandra has no “subset” concept, nor a “load subset”
 command, so what are we really talking about?

 Also, I presume we are talking CQL, but some of the references seem
 more Thrift/slice oriented.

 -- Jack Krupansky

  *From:* Eric Stevens 
 *Sent:* Sunday, December 7, 2014 10:12 AM
 *To:* user@cassandra.apache.org
 *Subject