Re: Counter column impossible to delete and re-insert

2014-11-06 Thread DuyHai Doan
Hello Clément

This is a known anti-pattern. You should never re-use a deleted counter
column otherwise there will be unpredictable result for the counter value.
Le 6 nov. 2014 08:45, Clément Fumey clement@gmail.com a écrit :

 Hi,

 I have a table with counter column . When I insert (update) a row, delete
 it and try to re-insert, it fail to re-insert the row. Here is the commands
 i use :

 CREATE TABLE test(
 testId int,
 year int,
 testCounter counter,
 PRIMARY KEY (testId, year)
 )WITH CLUSTERING ORDER BY (year DESC);

 UPDATE test SET testcounter = testcounter +5 WHERE testid = 2 AND year =
 2014;
 DELETE FROM test WHERE testid = 2 AND year = 2014;
 UPDATE test SET testcounter = testcounter +5 WHERE testid = 2 AND year =
 2014;

 The last command failed, there is no error message but the table is empty
 after it.
 Is that normal? Am I doing something wrong?

 Regards

 Clément



Re: Storing files in Cassandra with Spring Data / Astyanax

2014-11-06 Thread DuyHai Doan
You'd better off asking on the Spring Data Cassandra mailing list.

I think that very few people not to say nobody tried integrating Astyanax
with Spring Data Cassandra...
Le 6 nov. 2014 08:17, Wim Deblauwe wim.debla...@gmail.com a écrit :

 Hi,

 We are building an application where we install it on-premise, usually
 there is no internet connection at all there. As I am using Cassandra for
 storing everything else in the application, it would be very convenient to
 also use Cassandra for those files so I don't have to set up 2 distributed
 systems for each installation we do.

 Is there documentation somewhere on how to integrate/get started with
 Astyanax with Spring Data Cassandra ?

 regards,

 Wim

 2014-11-05 23:40 GMT+01:00 Redmumba redmu...@gmail.com:

 Astyanax isn't deprecated; that user is wrong and is downvoted--and has a
 comment mentioning the same.

 What you're describing doesn't sound like you need a data store at all;
 it /sounds/ like you need a file store.  Why not use S3 or similar to store
 your images?  What benefits are you expecting to receive from Cassandra?
 It sounds like you're incurring an awful lot of overhead for what amounts
 to a file lookup.

 On Wed, Nov 5, 2014 at 8:19 AM, Wim Deblauwe wim.debla...@gmail.com
 wrote:

 Hi,

 I am currently testing with Cassandra and Spring Data Cassandra. I would
 now need to store files (images and avi files, normally up to 50 Mb big).

 I did find the Chuncked Object store
 https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store from
 Astyanax  which looks promising. However, I have no idea on how to combine
 Astyanax with Spring Data Cassandra ?

 Also this answer on SO http://stackoverflow.com/a/25926062/40064
 states that Netflix is no longer working on Astyanax, so maybe this is not
 a good option to base my application?

 Are there any other options (where I can keep using Spring Data
 Cassandra)?

 I also read
 http://www.datastax.com/docs/datastax_enterprise3.0/solutions/hadoop_multiple_cfs
 but it is unclear to me if I would need to install Hadoop as well if I want
 to use this?

 regards,

 Wim






read after write inconsistent even on a one node cluster

2014-11-06 Thread Brian Tarbox
We're doing development on a single node cluster (and yes of course we're
not really deploying that way), and we're getting inconsistent behavior on
reads after writes.

We write values to our keyspaces and then immediately read the values back
(in our Cucumber tests).  About 20% of the time we get the old value.if
we wait 1 second and redo the query (within the same java method) we get
the new value.

This is all happening on a single node...how is this possible?

We're using 2.0.9 and the java client.   Though it shouldn't matter given a
single node cluster I set the consistency level to ALL with no effect.

I've read CASSANDRA-876 which seems spot-on but it was closed as
won't-fix...and I don't see what the solution is.

Thanks in advance for any help.

Brian Tarbox

-- 
http://about.me/BrianTarbox


Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Eric Stevens
If this is just for doing tests to make sure you get back the data you
expect, I would recommend looking some sort of eventually construct in your
testing.  We use Specs2 as our testing framework, and our write-then-read
tests look something like this:

someDAO.write(someObject)

eventually {
someDAO.read(someObject.id) mustEqual someObject
}

This will retry the read repeatedly over a short duration.

Just in case you are trying to do write-then-read outside of tests, you
should be aware that it's a Bad Idea™, but your email reads like you
already know that =)

On Thu Nov 06 2014 at 7:16:25 AM Brian Tarbox briantar...@gmail.com wrote:

 We're doing development on a single node cluster (and yes of course we're
 not really deploying that way), and we're getting inconsistent behavior on
 reads after writes.

 We write values to our keyspaces and then immediately read the values back
 (in our Cucumber tests).  About 20% of the time we get the old value.if
 we wait 1 second and redo the query (within the same java method) we get
 the new value.

 This is all happening on a single node...how is this possible?

 We're using 2.0.9 and the java client.   Though it shouldn't matter given
 a single node cluster I set the consistency level to ALL with no effect.

 I've read CASSANDRA-876 which seems spot-on but it was closed as
 won't-fix...and I don't see what the solution is.

 Thanks in advance for any help.

 Brian Tarbox

 --
 http://about.me/BrianTarbox



Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Brian Tarbox
Thanks.   Right now its just for testing but in general we can't guard
against multiple users ending up the one writes and then one reads.

It would be one thing if the read just got old data but we're seeing it
return wrong data...i.e. data that doesn't correspond to any particular
version of the object.

Brian

On Thu, Nov 6, 2014 at 10:30 AM, Eric Stevens migh...@gmail.com wrote:

 If this is just for doing tests to make sure you get back the data you
 expect, I would recommend looking some sort of eventually construct in your
 testing.  We use Specs2 as our testing framework, and our write-then-read
 tests look something like this:

 someDAO.write(someObject)

 eventually {
 someDAO.read(someObject.id) mustEqual someObject
 }

 This will retry the read repeatedly over a short duration.

 Just in case you are trying to do write-then-read outside of tests, you
 should be aware that it's a Bad Idea™, but your email reads like you
 already know that =)

 On Thu Nov 06 2014 at 7:16:25 AM Brian Tarbox briantar...@gmail.com
 wrote:

 We're doing development on a single node cluster (and yes of course we're
 not really deploying that way), and we're getting inconsistent behavior on
 reads after writes.

 We write values to our keyspaces and then immediately read the values
 back (in our Cucumber tests).  About 20% of the time we get the old
 value.if we wait 1 second and redo the query (within the same java
 method) we get the new value.

 This is all happening on a single node...how is this possible?

 We're using 2.0.9 and the java client.   Though it shouldn't matter given
 a single node cluster I set the consistency level to ALL with no effect.

 I've read CASSANDRA-876 which seems spot-on but it was closed as
 won't-fix...and I don't see what the solution is.

 Thanks in advance for any help.

 Brian Tarbox

 --
 http://about.me/BrianTarbox




-- 
http://about.me/BrianTarbox


authentication with cassandra-stress 2.1

2014-11-06 Thread James Derieg
Is there a way to authenticate to cassandra using the new 
cassandra-stress tool released with cassandra 2.1? It appears as if the 
'-un' (username) and '-pw' (password) switches have been removed from 
the tool.


In the 2.0 version, this is the command I would run: 'cassandra-stress 
-D nodesfile -un -pw '


The 2.1 version has been totally reworked, and that command fails 
completely. Looking through the documentation reveals nothing about 
authenication to cassandra. Google searches have turned up nothing since 
it's so new.


I did try one suggestion about putting authentication information into 
~/.cassandra/cqlshrc


This seems to work for the cqlsh tool, but not for cassandra-stress. Any 
suggestions would be greatly appreciated. Thanks!



---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com



Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Robert Coli
On Thu, Nov 6, 2014 at 6:14 AM, Brian Tarbox briantar...@gmail.com wrote:

 We write values to our keyspaces and then immediately read the values back
 (in our Cucumber tests).  About 20% of the time we get the old value.if
 we wait 1 second and redo the query (within the same java method) we get
 the new value.

 This is all happening on a single node...how is this possible?


It sounds unreasonable/unexpected to me, if you have a trivial repro case,
I would file a JIRA.

=Rob


Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Jonathan Haddad
For cqlengine we do quite a bit of write then read to ensure data was
written correctly, across 1.2, 2.0, and 2.1.  For what it's worth,
I've never seen this issue come up.  On a single node, Cassandra only
acks the write after it's been written into the memtable.  So, you'd
expect to see the most recent data.

A possibility - if you're running in a VM, it's possible the clock
isn't incrementing in real time?  I've seen this happen with uuid1
generation - I was getting duplicates if I generated them fast enough.
Perhaps you're writing 2 values one right after the other and they're
getting the same millisecond precision timestamp.

On Thu, Nov 6, 2014 at 10:26 AM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Nov 6, 2014 at 6:14 AM, Brian Tarbox briantar...@gmail.com wrote:

 We write values to our keyspaces and then immediately read the values back
 (in our Cucumber tests).  About 20% of the time we get the old value.if
 we wait 1 second and redo the query (within the same java method) we get the
 new value.

 This is all happening on a single node...how is this possible?


 It sounds unreasonable/unexpected to me, if you have a trivial repro case, I
 would file a JIRA.

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


C* on Fusion IO

2014-11-06 Thread Kevin Burton
We’re looking at switching data centers and they’re offering pretty
aggressive pricing on boxes with fusion IO cards.

2x 1.2TB Fusion IO
128GB RAM
20 cores.

now.. this isn’t the typical cassandra box.  Most people are running
multiple nodes to scale out vs scale vertically.  But these boxes are
priced aggressively and honestly I think that cassandra would be able to
saturate the gigabit ethernet port on these machines.

so it *might* be that these are TOO powerful in a way.

Curious if others are running in this config and what tuning options were
required to get it to work.

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: C* on Fusion IO

2014-11-06 Thread Russ Bradberry
I've heard of people running dense nodes (8+ TB) using fusion I/O, but with 
10GBe connections. I mean why buy a Ferrari and never leave first gear?

As far as saturating the network goes, I guess that all depends on your 
workload, and how often you need to repair.

Sent from my iPhone

 On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:
 
 We’re looking at switching data centers and they’re offering pretty 
 aggressive pricing on boxes with fusion IO cards.
 
 2x 1.2TB Fusion IO
 128GB RAM
 20 cores.
 
 now.. this isn’t the typical cassandra box.  Most people are running multiple 
 nodes to scale out vs scale vertically.  But these boxes are priced 
 aggressively and honestly I think that cassandra would be able to saturate 
 the gigabit ethernet port on these machines.
 
 so it *might* be that these are TOO powerful in a way. 
 
 Curious if others are running in this config and what tuning options were 
 required to get it to work.
 
 Kevin
 
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 


Re: C* on Fusion IO

2014-11-06 Thread Christopher Brodt
You should get pretty great performance with those FusionIO cards. One
thing I watch out for whenever scaling Cassandra vertically is compaction
times, which probably won't matter here. However, you have to take into
account that you lose some resiliency to failures with less nodes.

On Thu, Nov 6, 2014 at 2:48 PM, Russ Bradberry rbradbe...@gmail.com wrote:

 I've heard of people running dense nodes (8+ TB) using fusion I/O, but
 with 10GBe connections. I mean why buy a Ferrari and never leave first gear?

 As far as saturating the network goes, I guess that all depends on your
 workload, and how often you need to repair.

 Sent from my iPhone

 On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:

 We’re looking at switching data centers and they’re offering pretty
 aggressive pricing on boxes with fusion IO cards.

 2x 1.2TB Fusion IO
 128GB RAM
 20 cores.

 now.. this isn’t the typical cassandra box.  Most people are running
 multiple nodes to scale out vs scale vertically.  But these boxes are
 priced aggressively and honestly I think that cassandra would be able to
 saturate the gigabit ethernet port on these machines.

 so it *might* be that these are TOO powerful in a way.

 Curious if others are running in this config and what tuning options were
 required to get it to work.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: C* on Fusion IO

2014-11-06 Thread Kevin Burton
This was one of my biggest issues too.  We were expecting to be at 5-10
nodes to start with and then 20-40 nodes in 60-90 days.

But this means we can run all of our database on 1 box :-P … but
realistically two.

Which means if one box goes offline then I’m at 50% capacity.  That and I
don’t even have the option for three replicas.

I think the ideal would be like a 600-1.2TB drive and something like
32-64GB of RAM and 3x more physical boxes.  But they don’t have that config
unfortunately.

On Thu, Nov 6, 2014 at 12:54 PM, Christopher Brodt ch...@uberbrodt.net
wrote:

 You should get pretty great performance with those FusionIO cards. One
 thing I watch out for whenever scaling Cassandra vertically is compaction
 times, which probably won't matter here. However, you have to take into
 account that you lose some resiliency to failures with less nodes.

 On Thu, Nov 6, 2014 at 2:48 PM, Russ Bradberry rbradbe...@gmail.com
 wrote:

 I've heard of people running dense nodes (8+ TB) using fusion I/O, but
 with 10GBe connections. I mean why buy a Ferrari and never leave first gear?

 As far as saturating the network goes, I guess that all depends on your
 workload, and how often you need to repair.

 Sent from my iPhone

 On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:

 We’re looking at switching data centers and they’re offering pretty
 aggressive pricing on boxes with fusion IO cards.

 2x 1.2TB Fusion IO
 128GB RAM
 20 cores.

 now.. this isn’t the typical cassandra box.  Most people are running
 multiple nodes to scale out vs scale vertically.  But these boxes are
 priced aggressively and honestly I think that cassandra would be able to
 saturate the gigabit ethernet port on these machines.

 so it *might* be that these are TOO powerful in a way.

 Curious if others are running in this config and what tuning options were
 required to get it to work.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: C* on Fusion IO

2014-11-06 Thread jeeyoung kim
I've been running with FIOs and we've been CPU bound most of the time. But
I'm not using native transport yet, and is hoping that it would make things
faster.

On Thu, Nov 6, 2014 at 12:54 PM, Christopher Brodt ch...@uberbrodt.net
wrote:

 You should get pretty great performance with those FusionIO cards. One
 thing I watch out for whenever scaling Cassandra vertically is compaction
 times, which probably won't matter here. However, you have to take into
 account that you lose some resiliency to failures with less nodes.

 On Thu, Nov 6, 2014 at 2:48 PM, Russ Bradberry rbradbe...@gmail.com
 wrote:

 I've heard of people running dense nodes (8+ TB) using fusion I/O, but
 with 10GBe connections. I mean why buy a Ferrari and never leave first gear?

 As far as saturating the network goes, I guess that all depends on your
 workload, and how often you need to repair.

 Sent from my iPhone

 On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:

 We’re looking at switching data centers and they’re offering pretty
 aggressive pricing on boxes with fusion IO cards.

 2x 1.2TB Fusion IO
 128GB RAM
 20 cores.

 now.. this isn’t the typical cassandra box.  Most people are running
 multiple nodes to scale out vs scale vertically.  But these boxes are
 priced aggressively and honestly I think that cassandra would be able to
 saturate the gigabit ethernet port on these machines.

 so it *might* be that these are TOO powerful in a way.

 Curious if others are running in this config and what tuning options were
 required to get it to work.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 
Jeeyoung Kim
http://kimjeeyoung.com/


Re: C* on Fusion IO

2014-11-06 Thread Christopher Brodt
Yep. The trouble with FIOs is that they almost completely remove your
disk throughput problems, so then you're constrained by CPU. Concurrent
compactors and concurrent writes are two params that come to mind but there
are likely others.

@kevin. I hear you. 5TB is sort of a maximum that DataStax has been
recommending per an instance with SSDs (there's lots of factors at play
though; YMMV and I encourage you to test it out). I would also note that
the more RAM you can get for Cassandra, the better. Newer version of C*
store caches, indexes, and memtables in native memory and many of those are
proportional to the size of the datasets on the box. 64GB would probably
work for around 1TB without row caching, but if you have a lot of indexes
or column families 128GB is not unreasonable.  Otherwise, your proposed
ideal config would be...ideal.

Chris

On Thu, Nov 6, 2014 at 3:05 PM, jeeyoung kim jeeyou...@gmail.com wrote:

 I've been running with FIOs and we've been CPU bound most of the time. But
 I'm not using native transport yet, and is hoping that it would make things
 faster.

 On Thu, Nov 6, 2014 at 12:54 PM, Christopher Brodt ch...@uberbrodt.net
 wrote:

 You should get pretty great performance with those FusionIO cards. One
 thing I watch out for whenever scaling Cassandra vertically is compaction
 times, which probably won't matter here. However, you have to take into
 account that you lose some resiliency to failures with less nodes.

 On Thu, Nov 6, 2014 at 2:48 PM, Russ Bradberry rbradbe...@gmail.com
 wrote:

 I've heard of people running dense nodes (8+ TB) using fusion I/O, but
 with 10GBe connections. I mean why buy a Ferrari and never leave first gear?

 As far as saturating the network goes, I guess that all depends on your
 workload, and how often you need to repair.

 Sent from my iPhone

 On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:

 We’re looking at switching data centers and they’re offering pretty
 aggressive pricing on boxes with fusion IO cards.

 2x 1.2TB Fusion IO
 128GB RAM
 20 cores.

 now.. this isn’t the typical cassandra box.  Most people are running
 multiple nodes to scale out vs scale vertically.  But these boxes are
 priced aggressively and honestly I think that cassandra would be able to
 saturate the gigabit ethernet port on these machines.

 so it *might* be that these are TOO powerful in a way.

 Curious if others are running in this config and what tuning options
 were required to get it to work.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





 --
 Jeeyoung Kim
 http://kimjeeyoung.com/




Re: C* on Fusion IO

2014-11-06 Thread Kevin Burton
This is definitely a first world problem.. having databases that are CPU
bound :-P

On Thu, Nov 6, 2014 at 1:05 PM, jeeyoung kim jeeyou...@gmail.com wrote:

 I've been running with FIOs and we've been CPU bound most of the time. But
 I'm not using native transport yet, and is hoping that it would make things
 faster.

 On Thu, Nov 6, 2014 at 12:54 PM, Christopher Brodt ch...@uberbrodt.net
 wrote:

 You should get pretty great performance with those FusionIO cards. One
 thing I watch out for whenever scaling Cassandra vertically is compaction
 times, which probably won't matter here. However, you have to take into
 account that you lose some resiliency to failures with less nodes.

 On Thu, Nov 6, 2014 at 2:48 PM, Russ Bradberry rbradbe...@gmail.com
 wrote:

 I've heard of people running dense nodes (8+ TB) using fusion I/O, but
 with 10GBe connections. I mean why buy a Ferrari and never leave first gear?

 As far as saturating the network goes, I guess that all depends on your
 workload, and how often you need to repair.

 Sent from my iPhone

 On Nov 6, 2014, at 3:40 PM, Kevin Burton bur...@spinn3r.com wrote:

 We’re looking at switching data centers and they’re offering pretty
 aggressive pricing on boxes with fusion IO cards.

 2x 1.2TB Fusion IO
 128GB RAM
 20 cores.

 now.. this isn’t the typical cassandra box.  Most people are running
 multiple nodes to scale out vs scale vertically.  But these boxes are
 priced aggressively and honestly I think that cassandra would be able to
 saturate the gigabit ethernet port on these machines.

 so it *might* be that these are TOO powerful in a way.

 Curious if others are running in this config and what tuning options
 were required to get it to work.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





 --
 Jeeyoung Kim
 http://kimjeeyoung.com/




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: C* on Fusion IO

2014-11-06 Thread Kevin Burton
On Thu, Nov 6, 2014 at 2:10 PM, Christopher Brodt ch...@uberbrodt.net
wrote:

 Yep. The trouble with FIOs is that they almost completely remove your
 disk throughput problems, so then you're constrained by CPU. Concurrent
 compactors and concurrent writes are two params that come to mind but there
 are likely others.


Agreed.  I think the ideal scenarios for C* is about 800GB of SSD, say 64GB
of RAM, and like = 5 nodes.  this way you have a fairly big install to
justify going with C* but also you can run things like zookeeper and if one
of your nodes goes offline you only lose 20% of your capacity.

Unfortunately, these boxes are rather large which is why I posted.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Kevin Burton
I’m curious what people are doing with multiple SSDs per server.

I think there are two main paths:

- RAID 0 them… the problem here is that RAID0 is not a panacea and the
drives may or may not see better IO throughput.

- use N cassandra instances per box (or containers) and have one C* node
accessing each SSD.  The upside here is that Cassandra sees the drive
directly.  The downside is that you would probably have to cheat and tell
C* that all the containers on that box are on the same “rack” so C* doesn’t
schedule two replicas on the same box.

Thoughts?

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Chris Lohfink
If optimizing for IO, use Cassandra's JBOD configuration (list each disk
under data directories in cassandra.yaml).  It would put sstables on the
disk thats least used.  If want to optimize for disk space, I'd go with
RAID0.  Will probably want to tune concurrent reader/writers, stream
throughput (if have network for it) and compaction throughput if you end up
with IO to spare.  I generally would not recommend putting multiple C*
instances on a single box.

---
Chris Lohfink

On Thu, Nov 6, 2014 at 5:13 PM, Kevin Burton bur...@spinn3r.com wrote:

 I’m curious what people are doing with multiple SSDs per server.

 I think there are two main paths:

 - RAID 0 them… the problem here is that RAID0 is not a panacea and the
 drives may or may not see better IO throughput.

 - use N cassandra instances per box (or containers) and have one C* node
 accessing each SSD.  The upside here is that Cassandra sees the drive
 directly.  The downside is that you would probably have to cheat and tell
 C* that all the containers on that box are on the same “rack” so C* doesn’t
 schedule two replicas on the same box.

 Thoughts?

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: tuning concurrent_reads param

2014-11-06 Thread Bryan Talbot
On Wed, Nov 5, 2014 at 11:00 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 Sorry I have late follow up question 

 In the Cassandra.yaml file the concurrent_read section has the following
 comment:

 What does it mean by  the operations to enqueue low enough in the stack
 that the OS and drives can reorder them. ? how does it help making the
 system healthy?


The operating system, disk controllers, and disks themselves can merge and
reorder requests to optimize performance.

Here's a relevant page with some details if you're interested in more
http://www.makelinux.net/books/lkd2/ch13lev1sec5



 What really happen if we increase it to a too high value? (maybe affecting
 other read or write operation as it eat up all disk IO resource?)



Yes

-Bryan


Re: tuning concurrent_reads param

2014-11-06 Thread Jimmy Lin
I see, thanks for explaining what that means.

If we are using SSD, then reordering/merging has less impact than
traditional mechanical hard disk, so using SSD drive probably can deal
with increased  concurrent_read better. (?)


Re: Multiple SSD disks per sever? Ideal config?

2014-11-06 Thread Kevin Burton
I have seen people do this but I can’t find documentation for it, and
specifically how well it optimizes IO.  Does it write blocks to both
disks?  How is IO parallelized.

Too many questions to list them all.

On Thu, Nov 6, 2014 at 3:27 PM, Chris Lohfink chris.lohf...@datastax.com
wrote:

 If optimizing for IO, use Cassandra's JBOD configuration (list each disk
 under data directories in cassandra.yaml).  It would put sstables on the
 disk thats least used.  If want to optimize for disk space, I'd go with
 RAID0.  Will probably want to tune concurrent reader/writers, stream
 throughput (if have network for it) and compaction throughput if you end up
 with IO to spare.  I generally would not recommend putting multiple C*
 instances on a single box.

 ---
 Chris Lohfink

 On Thu, Nov 6, 2014 at 5:13 PM, Kevin Burton bur...@spinn3r.com wrote:

 I’m curious what people are doing with multiple SSDs per server.

 I think there are two main paths:

 - RAID 0 them… the problem here is that RAID0 is not a panacea and the
 drives may or may not see better IO throughput.

 - use N cassandra instances per box (or containers) and have one C* node
 accessing each SSD.  The upside here is that Cassandra sees the drive
 directly.  The downside is that you would probably have to cheat and tell
 C* that all the containers on that box are on the same “rack” so C* doesn’t
 schedule two replicas on the same box.

 Thoughts?

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com