from:"Ryan Svihla"

Re: Issue with Cassandra consistency in results

2017-03-16 Thread Ryan Svihla

Depends actually, restore just restores what's there, so if only one node
had a copy of the data then only one node had a copy of the data meaning
quorum will still be wrong sometimes.

On Thu, Mar 16, 2017 at 1:53 PM, Arvydas Jonusonis <
arvydas.jonuso...@gmail.com> wrote:

> If the data was written at ONE, consistency is not guaranteed. ..but
> considering you just restored the cluster, there's a good chance something
> else is off.
>
> On Thu, Mar 16, 2017 at 18:19 srinivasarao daruna 
> wrote:
>
>> Want to make read and write QUORUM as well.
>>
>>
>> On Mar 16, 2017 1:09 PM, "Ryan Svihla"  wrote:
>>
>> Replication factor is 3, and write consistency is ONE and read
>> consistency is QUORUM.
>>
>> That combination is not gonna work well:
>>
>> *Write succeeds to NODE A but fails on node B,C*
>>
>> *Read goes to NODE B, C*
>>
>> If you can tolerate some temporary inaccuracy you can use QUORUM but may
>> still have the situation where
>>
>> Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
>> Read succeeds on node B and C at timestamp 1
>>
>> If you need fully race condition free counts I'm afraid you need to use
>> SERIAL or LOCAL_SERIAL (for in DC only accuracy)
>>
>> On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna <
>> sree.srin...@gmail.com> wrote:
>>
>> Replication strategy is SimpleReplicationStrategy.
>>
>> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>>
>> I was worried that CL=ALL have more read latency and read failures. But
>> won't rule out trying it.
>>
>> Should I switch select count (*) to select partition_key column? Would
>> that be of any help.?
>>
>>
>> Thank you
>> Regards
>> Srini
>>
>> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" <
>> arvydas.jonuso...@gmail.com> wrote:
>>
>> What are your replication strategy and snitch settings?
>>
>> Have you tried doing a read at CL=ALL? If it's an actual inconsistency
>> issue (missing data), this should cause the correct results to be returned.
>> You'll need to run a repair to fix the inconsistencies.
>>
>> If all the data is actually there, you might have one or several nodes
>> that aren't identifying the correct replicas.
>>
>> Arvydas
>>
>>
>>
>> On Thu, Mar 16, 2017 at 5:31 PM, srinivasarao daruna <
>> sree.srin...@gmail.com> wrote:
>>
>> Hi Team,
>>
>> We are struggling with a problem related to cassandra counts, after
>> backup and restore of the cluster. Aaron Morton has suggested to send this
>> to user list, so some one of the list will be able to help me.
>>
>> We are have a rest api to talk to cassandra and one of our query which
>> fetches count is creating problems for us.
>>
>> We have done backup and restore and copied all the data to new cluster.
>> We have done nodetool refresh on the tables, and did the nodetool repair as
>> well.
>>
>> However, one of our key API call is returning inconsistent results. The
>> result count is 0 in the first call and giving the actual values for later
>> calls. The query frequency is bit high and failure rate has also raised
>> considerably.
>>
>> 1) The count query has partition keys in it. Didnt see any read timeout
>> or any errors from api logs.
>>
>> 2) This is how our code of creating session looks.
>>
>> val poolingOptions = new PoolingOptions
>> poolingOptions
>>   .setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
>>   .setMaxConnectionsPerHost(HostDistance.LOCAL, 10)
>>   .setCoreConnectionsPerHost(HostDistance.REMOTE, 4)
>>   .setMaxConnectionsPerHost( HostDistance.REMOTE, 10)
>>
>> val builtCluster = clusterBuilder.withCredentials(username, password)
>>   .withPoolingOptions(poolingOptions)
>>   .build()
>> val cassandraSession = builtCluster.get.connect()
>>
>> val preparedStatement = cassandraSession.prepare(statement).
>> setConsistencyLevel(ConsistencyLevel.QUORUM)
>> cassandraSession.execute(preparedStatement.bind(args :_*))
>>
>> Query: SELECT count(*) FROM table_name WHERE parition_column=? AND
>> text_column_of_clustering_key=? AND date_column_of_clustering_key<=? AND
>> date_column_of_clustering_key>=?
>>
>> 3) Cluster configuration:
>>
>> 6 Machines: 3 seeds, we are using apache cassandra 3.9 version. Each
>> machine is equipped with 16 Cores and 64 GB Ram.
>

Re: Issue with Cassandra consistency in results

2017-03-16 Thread Ryan Svihla

Replication factor is 3, and write consistency is ONE and read
consistency is QUORUM.

That combination is not gonna work well:

*Write succeeds to NODE A but fails on node B,C*

*Read goes to NODE B, C*

If you can tolerate some temporary inaccuracy you can use QUORUM but may
still have the situation where

Write succeeds on node A a timestamp 1, B succeeds at timestamp 2
Read succeeds on node B and C at timestamp 1

If you need fully race condition free counts I'm afraid you need to use
SERIAL or LOCAL_SERIAL (for in DC only accuracy)

On Thu, Mar 16, 2017 at 1:04 PM, srinivasarao daruna  wrote:

> Replication strategy is SimpleReplicationStrategy.
>
> Smith is : EC2 snitch. As we deployed cluster on EC2 instances.
>
> I was worried that CL=ALL have more read latency and read failures. But
> won't rule out trying it.
>
> Should I switch select count (*) to select partition_key column? Would
> that be of any help.?
>
>
> Thank you
> Regards
> Srini
>
> On Mar 16, 2017 12:46 PM, "Arvydas Jonusonis" 
> wrote:
>
> What are your replication strategy and snitch settings?
>
> Have you tried doing a read at CL=ALL? If it's an actual inconsistency
> issue (missing data), this should cause the correct results to be returned.
> You'll need to run a repair to fix the inconsistencies.
>
> If all the data is actually there, you might have one or several nodes
> that aren't identifying the correct replicas.
>
> Arvydas
>
>
>
> On Thu, Mar 16, 2017 at 5:31 PM, srinivasarao daruna <
> sree.srin...@gmail.com> wrote:
>
>> Hi Team,
>>
>> We are struggling with a problem related to cassandra counts, after
>> backup and restore of the cluster. Aaron Morton has suggested to send this
>> to user list, so some one of the list will be able to help me.
>>
>> We are have a rest api to talk to cassandra and one of our query which
>> fetches count is creating problems for us.
>>
>> We have done backup and restore and copied all the data to new cluster.
>> We have done nodetool refresh on the tables, and did the nodetool repair as
>> well.
>>
>> However, one of our key API call is returning inconsistent results. The
>> result count is 0 in the first call and giving the actual values for later
>> calls. The query frequency is bit high and failure rate has also raised
>> considerably.
>>
>> 1) The count query has partition keys in it. Didnt see any read timeout
>> or any errors from api logs.
>>
>> 2) This is how our code of creating session looks.
>>
>> val poolingOptions = new PoolingOptions
>> poolingOptions
>>   .setCoreConnectionsPerHost(HostDistance.LOCAL, 4)
>>   .setMaxConnectionsPerHost(HostDistance.LOCAL, 10)
>>   .setCoreConnectionsPerHost(HostDistance.REMOTE, 4)
>>   .setMaxConnectionsPerHost( HostDistance.REMOTE, 10)
>>
>> val builtCluster = clusterBuilder.withCredentials(username, password)
>>   .withPoolingOptions(poolingOptions)
>>   .build()
>> val cassandraSession = builtCluster.get.connect()
>>
>> val preparedStatement = cassandraSession.prepare(state
>> ment).setConsistencyLevel(ConsistencyLevel.QUORUM)
>> cassandraSession.execute(preparedStatement.bind(args :_*))
>>
>> Query: SELECT count(*) FROM table_name WHERE parition_column=? AND
>> text_column_of_clustering_key=? AND date_column_of_clustering_key<=? AND
>> date_column_of_clustering_key>=?
>>
>> 3) Cluster configuration:
>>
>> 6 Machines: 3 seeds, we are using apache cassandra 3.9 version. Each
>> machine is equipped with 16 Cores and 64 GB Ram.
>>
>> Replication factor is 3, and write consistency is ONE and read
>> consistency is QUORUM.
>>
>> 4) cassandra is never down on any machine
>>
>> 5) Using cassandra-driver-core artifact with 3.1.1 version in the api.
>>
>> 6) nodetool tpstats shows no read failures, and no other failures.
>>
>> 7) Do not see any other issues from system.log of cassandra. We just see
>> few warnings as below.
>>
>> Maximum memory usage reached (512.000MiB), cannot allocate chunk of
>> 1.000MiB
>> WARN  [ScheduledTasks:1] 2017-03-14 14:58:37,141 QueryProcessor.java:103
>> - 88 prepared statements discarded in the last minute because cache limit
>> reached (32 MB)
>> The first api call returns 0 and the api calls later gives right values.
>>
>> Please let me know, if any other details needed.
>> Could you please have a look at this issue once and kindly give me your
>> inputs? This issue literally broke the confidence on Cassandra from our
>> business team.
>>
>> Your inputs will be really helpful.
>>
>> Thank You,
>> Regards,
>> Srini
>>
>
>
>


-- 

Thanks,
Ryan Svihla

Re: TransportException - Consistency LOCAL_ONE - EC2

2017-03-15 Thread Ryan Svihla

give it a try see how it behaves

On Mar 15, 2017 10:09 AM, "Frank Hughes"  wrote:

> Thanks Ryan, appreciated again. getPolicy just had this:
>
> Policy policy = new TokenAwarePolicy(DCAwareRoundRobinPolicy.
> builder().build());
>
> so i guess i need
>
> Policy policy = new 
> TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().build(),
> false);
>
> Frank
>
> On 2017-03-15 13:45 (-), Ryan Svihla  wrote:
> > I don't see what getPolicy is retrieving but you want to use TokenAware
> > with the shuffle false option in the ctor, it defaults to shuffle true so
> > that load is spread when people have horribly fat partitions.
> >
> > On Wed, Mar 15, 2017 at 9:41 AM, Frank Hughes 
> > wrote:
> >
> > > Thanks for reply. Much appreciated.
> > >
> > > I should have included more detail. So I am using replication factor 2,
> > > and the code is using a token aware method of distributing the work so
> that
> > > only data that is primarily owned by the node is read on that local
> > > machine. So i guess this points to the logic im using to determine
> what is
> > > primarily owned by a node. I guess this is verging into something that
> > > should be posted to the java driver list, but i'll post here in case
> its
> > > useful or theres an obvious problem:
> > >
> > > PoolingOptions poolingOpts = new PoolingOptions();
> > > poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE,
> this.coreConn);
> > > poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE,
> this.maxConn);
> > > poolingOpts.setMaxRequestsPerConnection(HostDistance.LOCAL, 32768);
> > > poolingOpts.setMaxRequestsPerConnection(HostDistance.REMOTE, 2000);
> > >
> > > SocketOptions socketOptions = new SocketOptions();
> > > socketOptions.setReadTimeoutMillis(15000);
> > >
> > > Cluster.Builder builder = Cluster.builder();
> > > for(String contactPoint: contactPoints){
> > > builder.addContactPoint(contactPoint.trim());
> > > builder.withPoolingOptions(poolingOpts);
> > > builder.withSocketOptions(socketOptions);
> > > }
> > >
> > > builder.withLoadBalancingPolicy(getPolicy())
> > > .withQueryOptions(new QueryOptions()
> > > .setPrepareOnAllHosts(true)
> > > .setMetadataEnabled(true)
> > > );
> > >
> > > Cluster cluster = builder.build();
> > > Metadata metadata = cluster.getMetadata();
> > > Session session = cluster.connect(keyspaceName);
> > > Set allHosts = metadata.getAllHosts();
> > > int numberOfHost = 4;
> > >
> > > Host localHost = null;
> > > for (Host host : allHosts) {
> > > if(host.getAddress().getHostAddress().equalsIgnoreCase(local))
> > > localHost = host;
> > > }
> > >
> > > Map> replicaCount = new HashMap > > List>();
> > > TokenRange[] tokenRanges = unwrapTokenRanges(metadata.
> getTokenRanges()).toArray(new
> > > TokenRange[0]);
> > >
> > > List tokenRangeList = Arrays.asList(tokenRanges);
> > > tokenRangeList.sort(new Comparator() {
> > > @Override
> > > public int compare(TokenRange o1, TokenRange o2) {
> > > return o1.getStart().compareTo(o2.getStart());
> > > }
> > > });
> > >
> > > int numberOfHost = metadata.getAllHosts().size();
> > > int rangesPerHost = tokenRanges.length / numberOfHost;
> > >
> > > for(TokenRange tokenRange : tokenRangeList){
> > >
> > > Set hosts = metadata.getReplicas(keyspaceName, tokenRange);
> > >
> > > String rangeHosts = "";
> > > Iterator iter = hosts.iterator();
> > > while(iter.hasNext()){
> > > Host host = iter.next();
> > >
> > > List tokenRangesForHost = replicaCount.get(host);
> > > if(tokenRangesForHost == null){
> > > tokenRangesForHost = new ArrayList();
> > > }
> > >
> > > if(tokenRangesForHost.size() < rangesPerHost ||
> !iter.hasNext()){
> > > tokenRangesForHost.add(tokenRange);
> > > replicaCount.put(host, tokenRangesForHost);
> > > break;
> > > }
> > >
> > > rangeHosts += host.getAddress().toString();
> > > }
> > > }
> > >
> > > for(Host replica : replicaCount.keySet()){
> &

Re: Change the IP of a live node

2017-03-15 Thread Ryan Svihla

I've actually changed the ip address quite a bit (gossip complains on
startup and happily picks up the new address),  I think this maybe easier
such as..can those ip addresses route to one another ?

As in can the first node with 192.168.xx.xx hit the node with 10.179.xx.xx
on that interface?

On Wed, Mar 15, 2017 at 9:37 AM, kurt greaves  wrote:

> Cassandra uses the IP address for more or less everything. It's possible
> to change it through some hackery however probably not a great idea. The
> nodes system tables will still reference the old IP which is likely your
> problem here.
>
> On 14 March 2017 at 18:58, George Sigletos  wrote:
>
>> To give a complete picture, my node has actually two network interfaces:
>> eth0 for 192.168.xx.xx and eth1 for 10.179.xx.xx
>>
>> On Tue, Mar 14, 2017 at 7:46 PM, George Sigletos 
>> wrote:
>>
>>> Hello,
>>>
>>> I am trying to change the IP of a live node (I am not replacing a dead
>>> one).
>>>
>>> So I stop the service on my node (not a seed node), I change the IP from
>>> 192.168.xx.xx to 10.179.xx.xx, and modify "listen_address" and
>>> "rpc_address" in the cassandra.yaml, while I also set auto_bootstrap:
>>> false. Then I restart but it fails to see the rest of the cluster:
>>>
>>> Datacenter: DC1
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  AddressLoad   Tokens  OwnsHost
>>> ID   Rack
>>> DN  192.168.xx.xx  ?  256 ?
>>> 241f3002-8f89-4433-a521-4fa4b070b704  r1
>>> UN  10.179.xx.xx  3.45 TB256 ?
>>> 3b07df3b-683b-4e2d-b307-3c48190c8f1c  RAC1
>>> DN  192.168.xx.xx  ?  256 ?
>>> 19636f1e-9417-4354-8364-6617b8d3d20b  r1
>>> DN  192.168.xx.xx?  256 ?
>>> 9c65c71c-f5dd-4267-af9e-a20881cf3d48  r1
>>> DN  192.168.xx.xx   ?  256 ?
>>> ee75219f-0f2c-4be0-bd6d-038315212728  r1
>>>
>>> Am I doing anything wrong? Thanks in advance
>>>
>>> Kind regards,
>>> George
>>>
>>
>>
>


-- 

Thanks,
Ryan Svihla

Re: TransportException - Consistency LOCAL_ONE - EC2

2017-03-15 Thread Ryan Svihla

I don't see what getPolicy is retrieving but you want to use TokenAware
with the shuffle false option in the ctor, it defaults to shuffle true so
that load is spread when people have horribly fat partitions.

On Wed, Mar 15, 2017 at 9:41 AM, Frank Hughes 
wrote:

> Thanks for reply. Much appreciated.
>
> I should have included more detail. So I am using replication factor 2,
> and the code is using a token aware method of distributing the work so that
> only data that is primarily owned by the node is read on that local
> machine. So i guess this points to the logic im using to determine what is
> primarily owned by a node. I guess this is verging into something that
> should be posted to the java driver list, but i'll post here in case its
> useful or theres an obvious problem:
>
> PoolingOptions poolingOpts = new PoolingOptions();
> poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, this.coreConn);
> poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, this.maxConn);
> poolingOpts.setMaxRequestsPerConnection(HostDistance.LOCAL, 32768);
> poolingOpts.setMaxRequestsPerConnection(HostDistance.REMOTE, 2000);
>
> SocketOptions socketOptions = new SocketOptions();
> socketOptions.setReadTimeoutMillis(15000);
>
> Cluster.Builder builder = Cluster.builder();
> for(String contactPoint: contactPoints){
> builder.addContactPoint(contactPoint.trim());
> builder.withPoolingOptions(poolingOpts);
> builder.withSocketOptions(socketOptions);
> }
>
> builder.withLoadBalancingPolicy(getPolicy())
> .withQueryOptions(new QueryOptions()
> .setPrepareOnAllHosts(true)
> .setMetadataEnabled(true)
> );
>
> Cluster cluster = builder.build();
> Metadata metadata = cluster.getMetadata();
> Session session = cluster.connect(keyspaceName);
> Set allHosts = metadata.getAllHosts();
> int numberOfHost = 4;
>
> Host localHost = null;
> for (Host host : allHosts) {
> if(host.getAddress().getHostAddress().equalsIgnoreCase(local))
> localHost = host;
> }
>
> Map> replicaCount = new HashMap List>();
> TokenRange[] tokenRanges = 
> unwrapTokenRanges(metadata.getTokenRanges()).toArray(new
> TokenRange[0]);
>
> List tokenRangeList = Arrays.asList(tokenRanges);
> tokenRangeList.sort(new Comparator() {
> @Override
> public int compare(TokenRange o1, TokenRange o2) {
> return o1.getStart().compareTo(o2.getStart());
> }
> });
>
> int numberOfHost = metadata.getAllHosts().size();
> int rangesPerHost = tokenRanges.length / numberOfHost;
>
> for(TokenRange tokenRange : tokenRangeList){
>
> Set hosts = metadata.getReplicas(keyspaceName, tokenRange);
>
> String rangeHosts = "";
> Iterator iter = hosts.iterator();
> while(iter.hasNext()){
> Host host = iter.next();
>
> List tokenRangesForHost = replicaCount.get(host);
> if(tokenRangesForHost == null){
> tokenRangesForHost = new ArrayList();
> }
>
> if(tokenRangesForHost.size() < rangesPerHost || !iter.hasNext()){
> tokenRangesForHost.add(tokenRange);
> replicaCount.put(host, tokenRangesForHost);
> break;
> }
>
> rangeHosts += host.getAddress().toString();
> }
> }
>
> for(Host replica : replicaCount.keySet()){
> List allocatedRanges = replicaCount.get(replica);
> for(TokenRange tr : replicaCount.get(replica)){
>     System.out.println(tr.getStart() + " to " + tr.getEnd());
> }
> }
>
> //get a list of token ranges for this host
> List tokenRangesForHost = replicaCount.get(localHost);
>
> Again, any thoughts are much appreciated.
>
> Thanks
>
> Frank
>
>
> On 2017-03-15 12:38 (-), Ryan Svihla  wrote:
> > LOCAL_ONE just means local to the datacenter by default the tokenaware
> > policy will go to a replica that owns that data (primary or any replica
> > depends on the driver) and that may or may not be the node the driver
> > process is running on.
> >
> > So to put this more concretely if you have RF 2 with that 4 node cluster
> so
> > 2 nodes will be responsible for that data and if your local process is
> not
> > running on one of those 2 nodes it will definitely HAVE to go to another
> > node.
> >
> > Therefore, if you wanted to pin behavior to a local replica you'd have to
> > send your work out in a token aware fashion where said work only goes to
> > the primary token owner of that data, and remove any shuffling of
> replicas
> > in the process (is only on by default in the java driver to my
> knowledge).
> >
&

Re: TransportException - Consistency LOCAL_ONE - EC2

2017-03-15 Thread Ryan Svihla

LOCAL_ONE just means local to the datacenter by default the tokenaware
policy will go to a replica that owns that data (primary or any replica
depends on the driver) and that may or may not be the node the driver
process is running on.

So to put this more concretely if you have RF 2 with that 4 node cluster so
2 nodes will be responsible for that data and if your local process is not
running on one of those 2 nodes it will definitely HAVE to go to another
node.

Therefore, if you wanted to pin behavior to a local replica you'd have to
send your work out in a token aware fashion where said work only goes to
the primary token owner of that data, and remove any shuffling of replicas
in the process (is only on by default in the java driver to my knowledge).

On Wed, Mar 15, 2017 at 6:38 AM, Frank Hughes 
wrote:

> Hi there,
>
> Im running a java process on a 4 node cassandra 3.9 cluster on EC2
> (instance type t2.2xlarge), the process running separately on each of the
> nodes (i.e. 4 running JVMs).
> The process is just doing reads from Cassandra and building a SOLR index
> and using the java driver with consistency level LOCAL_ONE.
> However, the following exception is through:
>
> com.datastax.driver.core.exceptions.TransportException: [/10.0.0.2:9042]
> Connection has been closed
> at com.datastax.driver.core.exceptions.TransportException.
> copy(TransportException.java:38)
> at com.datastax.driver.core.exceptions.TransportException.
> copy(TransportException.java:24)
> at com.datastax.driver.core.DriverThrowables.propagateCause(
> DriverThrowables.java:37)
> at com.datastax.driver.core.ArrayBackedResultSet$
> MultiPage.prepareNextRow(ArrayBackedResultSet.java:313)
> at com.datastax.driver.core.ArrayBackedResultSet$
> MultiPage.isExhausted(ArrayBackedResultSet.java:269)
> at com.datastax.driver.core.ArrayBackedResultSet$1.
> hasNext(ArrayBackedResultSet.java:143)
>
> where 10.0.0.2 is not the local machine. So my questions:
>
> - Should this happen when Im using consistency level LOCAL_ONE and just
> doing reads ?
> - Does this suggest non-local reads are happening ?
>
> Many thanks for any help/ideas.
>
> Frank
>
>
>

-- 

Thanks,
Ryan Svihla

Re: HELP with bulk loading

2017-03-09 Thread Ryan Svihla

I suggest using cassandra loader

https://github.com/brianmhess/cassandra-loader

On Mar 9, 2017 5:30 PM, "Artur R"  wrote:

> Hello all!
>
> There are ~500gb of CSV files and I am trying to find the way how to
> upload them to C* table (new empty C* cluster of 3 nodes, replication
> factor 2) within reasonable time (say, 10 hours using 3-4 instance of
> c3.8xlarge EC2 nodes).
>
> My first impulse was to use CQLSSTableWriter, but it is too slow is of
> single instance and I can't efficiently parallelize it (just creating Java
> threads) because after some moment it always "hangs" (looks like GC is
> overstressed) and eats all available memory.
>
> So the questions are:
> 1. What is the best way to bulk-load huge amount of data to new C* cluster?
>
> This comment here: https://issues.apache.org/jira/browse/CASSANDRA-9323:
>
> The preferred way to bulk load is now COPY; see CASSANDRA-11053
>>  and linked
>> tickets
>
>
> is confusing because I read that the CQLSSTableWriter + sstableloader is
> much faster than COPY. Who is right?
>
> 2. Is there any real examples of multi-threaded using of CQLSSTableWriter?
> Maybe ready to use libraries like: https://github.com/spotify/hdfs2cass?
>
> 3. sstableloader is slow too. Assuming that I have new empty C* cluster,
> how can I improve the upload speed? Maybe disable replication or some other
> settings while streaming and then turn it back?
>
> Thanks!
> Artur.
>

Re: A Single Dropped Node Fails Entire Read Queries

2017-03-09 Thread Ryan Svihla

whats your keyspace replication settings and what's your query?

On Thu, Mar 9, 2017 at 9:32 AM, Shalom Sagges 
wrote:

> Hi Cassandra Users,
>
> I hope someone could help me understand the following scenario:
>
> Version: 3.0.9
> 3 nodes per DC
> 3 DCs in the cluster.
> Consistency Local_Quorum.
>
> I did a small resiliency test and dropped a node to check the availability
> of the data.
> What I assumed would happen is nothing at all. If a node is down in a 3
> nodes DC, Local_Quorum should still be satisfied.
> However, during the ~10 first seconds after stopping the service, I got
> timeout errors (tried it both from the client and from cqlsh.
>
> This is the error I get:
> *ServerError:
> com.google.common.util.concurrent.UncheckedExecutionException:
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException:
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> received only 4 responses.*
>
>
> After ~10 seconds, the same query is successful with no timeout errors.
> The dropped node is still down.
>
> Any idea what could cause this and how to fix it?
>
> Thanks!
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>



-- 

Thanks,
Ryan Svihla

Re: Disconnecting two data centers

2017-03-08 Thread Ryan Svihla

it's a bit tricky and I don't advise it, but the typical pattern is (say
you have DC1 and DC2):

1. partition the data centers from one another..kill the routing however
you can (firewall, etc)
2. while partitioned log onto DC1 alter schema so that DC2 is not
replicating), repeat for other.
2a. If using propertyfilesnitch remove the DC2 from all the DC1 property
files and vice versa
2b. change the seeds setting in the cassandra.yaml accordingly (DC1 yaml's
shouldn't have any seeds from DC2, etc)
3. rolling restart to account for this.
4,. run repair (not even sure how necessary this step is, but after doing
RF changes I do this to prevent hiccups)

I've done this a couple of times but really failing all of that, the more
well supported and harder to mess up but more work approach is:

1. Set DC2 to RF 0
2. remove all nodes from DC2
3. change yamls for seed files (update property file if need be)
4. create new cluster in DC2,
5. use sstableloader to stream DC1 data to DC2.

On Wed, Mar 8, 2017 at 8:13 AM, Chuck Reynolds 
wrote:

> I’m running C* 2.1.13 and I have two rings that are replicating data from
> our data center to one in AWS.
>
>
>
> We would like to keep both of them for a while but we have a need to
> disconnect them.  How can this be done?
>

-- 

Thanks,
Ryan Svihla

Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-12 Thread Ryan Svihla

It was just the first place google turned up, I made an answer late in the 
evening trying to help someone out on my own free time. 

Regards,

Ryan Svihla

> On Sep 12, 2016, at 6:34 AM, Mark Thomas  wrote:
> 
>> On 11/09/2016 23:07, Ryan Svihla wrote:
>> 1. A batch with updates to a single partition turns into a single
>> mutation so partition writes aren't possible (so may as well use
>> Unlogged batches)
>> 2. Yes, so use local_serial or serial reads and all updates you want to
>> honor LWT need to be LWT as well, this way everything is buying into the
>> same protocol and behaving accordingly. 
>> 3. LWT works with batch (has to be same partition).
>> https://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html if
>> condition doesn't fire none of the batch will (same partition will mean
>> it'll be the same mutation anyway so there really isn't any magic going on).
> 
> Is there a good reason for linking to the 3rd party docs rather than the
> official docs in this case? I can't see one at the moment.
> 
> The official docs appear to be:
> 
> http://cassandra.apache.org/doc/latest/cql/dml.html#batch
> 
> It might not matter in this particular instance but it looks as if there
> is a little more to the syntax than the 3rd party docs suggest (even if
> you switch to the latest version of those 3rd party docs).
> 
> Generally, if you are going to point to docs, please point to the
> official Apache Cassandra docs unless there is a very good reason not
> to. (And if the good reason is that there’s a deficiency in the Apache
> Cassandra docs, please make it known on the list or in a Jira so someone
> can write what’s missing)
> 
> Mark
> 
> 
>> Your biggest issue with such a design will be contention (as it would
>> with an rdbms with say row locking), as by intent you're making all
>> reads and writes block until any pending ones are complete. I'm sure
>> there are a couple things I forgot but this is the standard wisdom. 
>> 
>> Regards,
>> 
>> Ryan Svihla
>> 
>> On Sep 11, 2016, at 3:49 PM, Jens Rantil > <mailto:jens.ran...@tink.se>> wrote:
>> 
>>> Hi,
>>> 
>>> This might be off-topic, but you could always use Zookeeper locking
>>> and/or Apache Kafka topic keys for doing things like this.
>>> 
>>> Cheers,
>>> Jens
>>> 
>>> On Tuesday, September 6, 2016, Bhuvan Rawal >> <mailto:bhu1ra...@gmail.com>> wrote:
>>> 
>>>Hi,
>>> 
>>>We are working to solve on a multi threaded distributed design
>>>which in which a thread reads current state from Cassandra (Single
>>>partition ~ 20 Rows), does some computation and saves it back in.
>>>But it needs to be ensured that in between reading and writing by
>>>that thread any other thread should not have saved any operation
>>>on that partition.
>>> 
>>>We have thought of a solution for the same - *having a write_time
>>>column* in the schema and making it static. Every time the thread
>>>picks up a job read will be performed with LOCAL_QUORUM. While
>>>writing into Cassandra batch will contain a LWT (IF write_time is
>>>read time) otherwise read will be performed and computation will
>>>be done again and so on. This will ensure that while saving
>>>partition is in a state it was read from.
>>> 
>>>In order to avoid race condition we need to ensure couple of things:
>>> 
>>>1. While saving data in a batch with a single partition (*Rows may
>>>be Updates, Deletes, Inserts)* are they Isolated per replica node.
>>>(Not necessarily on a cluster as a whole). Is there a possibility
>>>of client reading partial rows?
>>> 
>>>2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this
>>>case could there a chance of inconsistency in this case (When LWT
>>>is being used in batches).
>>> 
>>>3. Is it possible to use multiple LWT in a single Batch? In
>>>general how does LWT performs with Batch and is Paxos acted on
>>>before batch execution?
>>> 
>>>Can someone help us with this?
>>> 
>>>Thanks & Regards,
>>>Bhuvan
>>> 
>>> 
>>> 
>>> -- 
>>> Jens Rantil
>>> Backend engineer
>>> Tink AB
>>> 
>>> Email: jens.ran...@tink.se <mailto:jens.ran...@tink.se>
>>> Phone: +46 708 84 18 32
>>> Web: www.tink.se <http://www.tink.se/>
>>> 
>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>  Twitter
>>> <https://twitter.com/tink>
>

Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-11 Thread Ryan Svihla

1. A batch with updates to a single partition turns into a single mutation so 
partition writes aren't possible (so may as well use Unlogged batches)
2. Yes, so use local_serial or serial reads and all updates you want to honor 
LWT need to be LWT as well, this way everything is buying into the same 
protocol and behaving accordingly. 
3. LWT works with batch (has to be same partition). 
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html if 
condition doesn't fire none of the batch will (same partition will mean it'll 
be the same mutation anyway so there really isn't any magic going on).

Your biggest issue with such a design will be contention (as it would with an 
rdbms with say row locking), as by intent you're making all reads and writes 
block until any pending ones are complete. I'm sure there are a couple things I 
forgot but this is the standard wisdom. 

Regards,

Ryan Svihla

> On Sep 11, 2016, at 3:49 PM, Jens Rantil  wrote:
> 
> Hi,
> 
> This might be off-topic, but you could always use Zookeeper locking and/or 
> Apache Kafka topic keys for doing things like this.
> 
> Cheers,
> Jens
> 
>> On Tuesday, September 6, 2016, Bhuvan Rawal  wrote:
>> Hi,
>> 
>> We are working to solve on a multi threaded distributed design which in 
>> which a thread reads current state from Cassandra (Single partition ~ 20 
>> Rows), does some computation and saves it back in. But it needs to be 
>> ensured that in between reading and writing by that thread any other thread 
>> should not have saved any operation on that partition.
>> 
>> We have thought of a solution for the same - having a write_time column in 
>> the schema and making it static. Every time the thread picks up a job read 
>> will be performed with LOCAL_QUORUM. While writing into Cassandra batch will 
>> contain a LWT (IF write_time is read time) otherwise read will be performed 
>> and computation will be done again and so on. This will ensure that while 
>> saving partition is in a state it was read from.
>> 
>> In order to avoid race condition we need to ensure couple of things:
>> 
>> 1. While saving data in a batch with a single partition (Rows may be 
>> Updates, Deletes, Inserts) are they Isolated per replica node. (Not 
>> necessarily on a cluster as a whole). Is there a possibility of client 
>> reading partial rows?
>> 
>> 2. If we do a LOCAL_QUORUM read and LOCAL_QUORUM writes in this case could 
>> there a chance of inconsistency in this case (When LWT is being used in 
>> batches).
>> 
>> 3. Is it possible to use multiple LWT in a single Batch? In general how does 
>> LWT performs with Batch and is Paxos acted on before batch execution?
>> 
>> Can someone help us with this?
>> 
>> Thanks & Regards,
>> Bhuvan
> 
> 
> -- 
> Jens Rantil
> Backend engineer
> Tink AB
> 
> Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
> 
> Facebook Linkedin Twitter
>

Re: Read timeouts on primary key queries

2016-09-01 Thread Ryan Svihla

Have you looked at cfhistograms/tablehistograms your data maybe just skewed 
(most likely explanation is probably the correct one here)

Regard,
Ryan Svihla

_
From: Joseph Tech 
Sent: Wednesday, August 31, 2016 11:16 PM
Subject: Re: Read timeouts on primary key queries
To:  

Patrick,
The desc table is below (only col names changed) : 
CREATE TABLE db.tbl (    id1 text,    id2 text,    id3 text,    id4 text,    f1 
text,    f2 map,    f3 map,    created timestamp,    
updated timestamp,    PRIMARY KEY (id1, id2, id3, id4)) WITH CLUSTERING ORDER 
BY (id2 ASC, id3 ASC, id4 ASC)    AND bloom_filter_fp_chance = 0.01    AND 
caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'    AND comment = ''    
AND compaction = {'sstable_size_in_mb': '50', 'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}    AND 
compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}    AND 
dclocal_read_repair_chance = 0.0    AND default_time_to_live = 0    AND 
gc_grace_seconds = 864000    AND max_index_interval = 2048    AND 
memtable_flush_period_in_ms = 0    AND min_index_interval = 128    AND 
read_repair_chance = 0.1    AND speculative_retry = '99.0PERCENTILE';
and the query is select * from tbl where id1=? and id2=? and id3=? and id4=?
The timeouts happen within ~2s to ~5s, while the successful calls have avg of 
8ms and p99 of 15s. These times are seen from app side, the actual query times 
would be slightly lower. 
Is there a way to capture traces only when queries take longer than a specified 
duration? . We can't enable tracing in production given the volume of traffic. 
We see that the same query which timed out works fine later, so not sure if the 
trace of a successful run would help.
Thanks,Joseph

On Wed, Aug 31, 2016 at 8:05 PM, Patrick McFadin  wrote:
If you are getting a timeout on one table, then a mismatch of RF and node count 
doesn't seem as likely. 
Time to look at your query. You said it was a 'select * from table where key=?' 
type query. I would next use the trace facility in cqlsh to investigate 
further. That's a good way to find hard to find issues. You should be looking 
for clear ledge where you go from single digit ms to 4 or 5 digit ms times. 
The other place to look is your data model for that table if you want to post 
the output from a desc table.
Patrick

On Tue, Aug 30, 2016 at 11:07 AM, Joseph Tech  wrote:
On further analysis, this issue happens only on 1 table in the KS which has the 
max reads. 
@Atul, I will look at system health, but didnt see anything standing out from 
GC logs. (using JDK 1.8_92 with G1GC). 
@Patrick , could you please elaborate the "mismatch on node count + RF" part.
On Tue, Aug 30, 2016 at 5:35 PM, Atul Saroha  wrote:
There could be many reasons for this if it is intermittent. CPU usage + I/O 
wait status. As read are I/O intensive, your IOPS requirement should be met 
that time load. Heap issue if CPU is busy for GC only. Network health could be 
the reason. So better to look system health during that time when it comes.

-
Atul Saroha
Lead Software Engineer
M: +91 8447784271 T: +91 124-415-6069 EXT: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech  wrote:
Hi Patrick,
The nodetool status shows all nodes up and normal now. From OpsCenter "Event 
Log" , there are some nodes reported as being down/up etc. during the timeframe 
of timeout, but these are Search workload nodes from the remote (non-local) DC. 
The RF is 3 and there are 9 nodes per DC.
Thanks,Joseph
On Mon, Aug 29, 2016 at 11:07 PM, Patrick McFadin  wrote:
You aren't achieving quorum on your reads as the error is explains. That means 
you either have some nodes down or your topology is not matching up. The fact 
you are using LOCAL_QUORUM might point to a datacenter mis-match on node count 
+ RF. 
What does your nodetool status look like?
Patrick
On Mon, Aug 29, 2016 at 10:14 AM, Joseph Tech  wrote:
Hi,
We recently started getting intermittent timeouts on primary key queries 
(select * from table where key=)
The error is : com.datastax.driver.core.exceptions.ReadTimeoutException: 
Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses 
were required but only 1 replica
a responded)
The same query would work fine when tried directly from cqlsh. There are no 
indications in system.log for the table in question, though there were 
compactions in progress for tables in another keyspace which is more frequently 
accessed. 
My understanding is that the chances of primary key queries timing out is very 
minimal. Please share the possible reasons / ways to debug this issue. 

We are using Cassandra 2.1 (DSE 4.8.7).
Thanks,Joseph

Re: Guidelines for configuring Thresholds for Cassandra metrics

2016-08-29 Thread Ryan Svihla

Benedict I really don't want to turn this into a battle about who's opinion is 
more valid and I really respect all the good work you've done for Apache 
Cassandra.
I'll just reiterate that I'm comfortable saying 0.6 is a good starting point 
and it is often not the ideal once you go through more thorough testing, all of 
which I said initially and I still think is a reasonable statement.
-regards,
Ryan Svihla







On Sat, Aug 27, 2016 at 9:31 AM -0500, "Benedict Elliott Smith" 
 wrote:










I did not claim you had no evidence, only that your statement lacked 
justification.  Again, nuance is important.
I was suggesting that blanket statements without the necessary caveats, to the 
user mailing list, countermanding the defaults without 'justification' 
(explanation, reasoning) is liable to cause confusion on what best practice is. 
 I attempted to provide some of the missing context to minimise this confusion 
while still largely agreeing with you.

However you should also bear in mind that you work as a field engineer for 
DataStax, and as such your sample of installation behaviours will be biased - 
towards those where the defaults have not worked well.


On Saturday, 27 August 2016, Ryan Svihla  wrote:

 I have been trying to get the docs fixed for this for the past 
3 months, and there already is a ticket open for changing the defaults. I don't 
feel like I've had a small amount of evidence here. All observation in the 3 
years of work in the field suggests compaction keeps coming up as the 
bottleneck when you push Cassandra ingest.0.6 as an initial setting has fixed 
20+ broken clusters in practice and it improved overall performance in every 
case from defaults of 0.33 to defaults of 0.03 (yaml suggests per core flush 
writers, add in the prevelance of HT and you see a lot of 24+ flush writer 
systems in the wild)
No disrespect intended but that default hasn't worked out well at all in my 
exposure to it, and 0.6 has never been worse than the default yet. Obviously 
write patterns, heap configuration, memtable size limits and what not affect 
the exact optimal setting and I've rarely had it end up 0.6 after a tuning 
exercise. I never intended that as a blanket recommendation, just a starting 
one.

_
From: Benedict Elliott Smith 
Sent: Friday, August 26, 2016 9:40 AM
Subject: Re: Guidelines for configuring Thresholds for Cassandra metrics
To:  


The default when I wrote it was 0.4 but it was found this did not saturate 
flush writers in JBOD configurations. Iirc it now defaults to 1/(1+#disks) 
which is not a terrible default, but obviously comes out much lower if you have 
many disks.
This smaller value behaves better for peak performance, but in a live system 
where compaction is king not saturating flush in return for lower write 
amplification (from flushing larger memtables) will indeed often be a win.
0.6, however, is probably not the best default unless you have a lot of tables 
being actively written to, in which case even 0.8 would be fine. With a single 
main table receiving your writes at a given time, 0.4 is probably an optimal 
value, when making this trade off against peak performance.
Anyway, it's probably better to file a ticket to discuss defaults and 
documentation than making a statement like this without justification. I can 
see where you're coming from, but it's confusing for users to have such blanket 
guidance that counters the defaults.  If the defaults can be improved (which I 
agree they can) it's probably better to do that, along with better 
documentation, so the nuance is accounted for.

On Friday, 26 August 2016, Ryan Svihla  wrote:

Forgot the most important thing. LogsERROR you should investigateWARN you 
should have a list of known ones. Use case dependent. Ideally you change 
configuration accordingly.*PoolCleaner (slab or native) - good indication node 
is tuned badly if you see a ton of this. Set memtable_cleanup_threshold to 0.6 
as an initial attempt to configure this correctly.  This is a complex topic to 
dive into, so that may not be the best number, it'll likely be better than the 
default, why its not the default is a big conversation.There are a bunch of 
other logs I look for that are escaping me at present but that's a good start
-regards,
Ryan Svihla



On Fri, Aug 26, 2016 at 7:21 AM -0500, "Ryan Svihla"  wrote:

Thomas,
Not all metrics are KPIs and are only useful when researching a specific issue 
or after a use case specific threshold has been set.
The main "canaries" I monitor are:* Pending compactions (dependent on the 
compaction strategy chosen but 1000 is a sign of severe issues in all cases)* 
dropped mutations (more than one I treat as a event to investigate, I believe 
in allowing operational overhead and any evidence of load shedding suggests I 
may not h

Re: Guidelines for configuring Thresholds for Cassandra metrics

2016-08-27 Thread Ryan Svihla

 I have been trying to get the docs fixed for this for the past 3 months, and 
there already is a ticket open for changing the defaults. I don't feel like 
I've had a small amount of evidence here. All observation in the 3 years of 
work in the field suggests compaction keeps coming up as the bottleneck when 
you push Cassandra ingest.0.6 as an initial setting has fixed 20+ broken 
clusters in practice and it improved overall performance in every case from 
defaults of 0.33 to defaults of 0.03 (yaml suggests per core flush writers, add 
in the prevelance of HT and you see a lot of 24+ flush writer systems in the 
wild)
No disrespect intended but that default hasn't worked out well at all in my 
exposure to it, and 0.6 has never been worse than the default yet. Obviously 
write patterns, heap configuration, memtable size limits and what not affect 
the exact optimal setting and I've rarely had it end up 0.6 after a tuning 
exercise. I never intended that as a blanket recommendation, just a starting 
one.

_
From: Benedict Elliott Smith 
Sent: Friday, August 26, 2016 9:40 AM
Subject: Re: Guidelines for configuring Thresholds for Cassandra metrics
To:  


The default when I wrote it was 0.4 but it was found this did not saturate 
flush writers in JBOD configurations. Iirc it now defaults to 1/(1+#disks) 
which is not a terrible default, but obviously comes out much lower if you have 
many disks.
This smaller value behaves better for peak performance, but in a live system 
where compaction is king not saturating flush in return for lower write 
amplification (from flushing larger memtables) will indeed often be a win.
0.6, however, is probably not the best default unless you have a lot of tables 
being actively written to, in which case even 0.8 would be fine. With a single 
main table receiving your writes at a given time, 0.4 is probably an optimal 
value, when making this trade off against peak performance.
Anyway, it's probably better to file a ticket to discuss defaults and 
documentation than making a statement like this without justification. I can 
see where you're coming from, but it's confusing for users to have such blanket 
guidance that counters the defaults.  If the defaults can be improved (which I 
agree they can) it's probably better to do that, along with better 
documentation, so the nuance is accounted for.

On Friday, 26 August 2016, Ryan Svihla  wrote:

Forgot the most important thing. LogsERROR you should investigateWARN you 
should have a list of known ones. Use case dependent. Ideally you change 
configuration accordingly.*PoolCleaner (slab or native) - good indication node 
is tuned badly if you see a ton of this. Set memtable_cleanup_threshold to 0.6 
as an initial attempt to configure this correctly.  This is a complex topic to 
dive into, so that may not be the best number, it'll likely be better than the 
default, why its not the default is a big conversation.There are a bunch of 
other logs I look for that are escaping me at present but that's a good start
-regards,
Ryan Svihla



On Fri, Aug 26, 2016 at 7:21 AM -0500, "Ryan Svihla"  wrote:

Thomas,
Not all metrics are KPIs and are only useful when researching a specific issue 
or after a use case specific threshold has been set.
The main "canaries" I monitor are:* Pending compactions (dependent on the 
compaction strategy chosen but 1000 is a sign of severe issues in all cases)* 
dropped mutations (more than one I treat as a event to investigate, I believe 
in allowing operational overhead and any evidence of load shedding suggests I 
may not have as much as I thought)* blocked anything (flush writers, etc..more 
than one I investigate)* system hints ( More than 1k I investigate)* heap usage 
and gc time vary a lot by use case and collector chosen, I aim for below 65% 
usage as an average with g1, but this again varies by use case a great deal. 
Sometimes I just looks the chart and query patterns and if they don't line up I 
have to do other deeper investigations* read and write latencies exceeding SLA 
is also use case dependent. Those that have none I tend to push towards p99 
with a middle end SSD based system having 100ms and a spindle based system 
having 600ms with CL one and assuming a "typical" query pattern (again query 
patterns and CL so vary here)* cell count and partition size vary greatly by 
hardware and gc tuning but I like to in the absence of all other relevant 
information like to keep cell count for a partition below 100k and size below 
100mb. I however have many successful use cases running more and I've had some 
fail well before that. Hardware and tuning tradeoff a shift this around a 
lot.There is unfortunately as you'll note a lot of nuance and the load out 
really changes what looks right (down to the model of SSDs I have different 
expectations for p99s if it&#x

Re: Guidelines for configuring Thresholds for Cassandra metrics

2016-08-26 Thread Ryan Svihla


Forgot the most important thing. LogsERROR you should investigateWARN you 
should have a list of known ones. Use case dependent. Ideally you change 
configuration accordingly.*PoolCleaner (slab or native) - good indication node 
is tuned badly if you see a ton of this. Set memtable_cleanup_threshold to 0.6 
as an initial attempt to configure this correctly.  This is a complex topic to 
dive into, so that may not be the best number, it'll likely be better than the 
default, why its not the default is a big conversation.There are a bunch of 
other logs I look for that are escaping me at present but that's a good start
-regards,
Ryan Svihla




On Fri, Aug 26, 2016 at 7:21 AM -0500, "Ryan Svihla"  wrote:










Thomas,
Not all metrics are KPIs and are only useful when researching a specific issue 
or after a use case specific threshold has been set.
The main "canaries" I monitor are:* Pending compactions (dependent on the 
compaction strategy chosen but 1000 is a sign of severe issues in all cases)* 
dropped mutations (more than one I treat as a event to investigate, I believe 
in allowing operational overhead and any evidence of load shedding suggests I 
may not have as much as I thought)* blocked anything (flush writers, etc..more 
than one I investigate)* system hints ( More than 1k I investigate)* heap usage 
and gc time vary a lot by use case and collector chosen, I aim for below 65% 
usage as an average with g1, but this again varies by use case a great deal. 
Sometimes I just looks the chart and query patterns and if they don't line up I 
have to do other deeper investigations* read and write latencies exceeding SLA 
is also use case dependent. Those that have none I tend to push towards p99 
with a middle end SSD based system having 100ms and a spindle based system 
having 600ms with CL one and assuming a "typical" query pattern (again query 
patterns and CL so vary here)* cell count and partition size vary greatly by 
hardware and gc tuning but I like to in the absence of all other relevant 
information like to keep cell count for a partition below 100k and size below 
100mb. I however have many successful use cases running more and I've had some 
fail well before that. Hardware and tuning tradeoff a shift this around a 
lot.There is unfortunately as you'll note a lot of nuance and the load out 
really changes what looks right (down to the model of SSDs I have different 
expectations for p99s if it's a model I haven't used before I'll do some 
comparative testing).
The reason so much of this is general and vague is my selection bias. I'm 
brought in when people are complaining about performance or some grand systemic 
crash because they were monitoring nothing. I have little ability to change 
hardware initially so I have to be willing to allow the hardware to do the best 
it can an establish levels where it can no longer keep up with the customers 
goals. This may mean for some use cases 10 pending compactions is an actionable 
event for them, for another customer 100 is. The better approach is to 
establish a baseline for when these metrics start to indicate a serious issue 
is occurring in that particular app. Basically when people notice a problem, 
what did these numbers look like in the minutes, hours and days prior? That's 
the way to establish the levels consistently.
Regards,
Ryan Svihla







On Fri, Aug 26, 2016 at 4:48 AM -0500, "Thomas Julian"  
wrote:










Hello,

I am working on setting up a monitoring tool to monitor Cassandra Instances. 
Are there any wikis which specifies optimum value for each Cassandra KPIs?
For instance, I am not sure,
What value of "Memtable Columns Count" can be considered as "Normal". 
What value of the same has to be considered as "Critical".
I knew threshold numbers for few params, for instance any thing more than zero 
for timeouts, pending tasks should be considered as unusual. Also, I am aware 
that most of the statistics' threshold numbers vary in accordance with Hardware 
Specification, Cassandra Environment Setup. But, what I request here is a 
general guideline for configuring thresholds for all the metrics.

If this has been already covered, please point me to that resource. If anyone 
on their own interest collected these things, please share.

Any help is appreciated.

Best Regards,
Julian.

Re: Guidelines for configuring Thresholds for Cassandra metrics

2016-08-26 Thread Ryan Svihla

Thomas,
Not all metrics are KPIs and are only useful when researching a specific issue 
or after a use case specific threshold has been set.
The main "canaries" I monitor are:* Pending compactions (dependent on the 
compaction strategy chosen but 1000 is a sign of severe issues in all cases)* 
dropped mutations (more than one I treat as a event to investigate, I believe 
in allowing operational overhead and any evidence of load shedding suggests I 
may not have as much as I thought)* blocked anything (flush writers, etc..more 
than one I investigate)* system hints ( More than 1k I investigate)* heap usage 
and gc time vary a lot by use case and collector chosen, I aim for below 65% 
usage as an average with g1, but this again varies by use case a great deal. 
Sometimes I just looks the chart and query patterns and if they don't line up I 
have to do other deeper investigations* read and write latencies exceeding SLA 
is also use case dependent. Those that have none I tend to push towards p99 
with a middle end SSD based system having 100ms and a spindle based system 
having 600ms with CL one and assuming a "typical" query pattern (again query 
patterns and CL so vary here)* cell count and partition size vary greatly by 
hardware and gc tuning but I like to in the absence of all other relevant 
information like to keep cell count for a partition below 100k and size below 
100mb. I however have many successful use cases running more and I've had some 
fail well before that. Hardware and tuning tradeoff a shift this around a 
lot.There is unfortunately as you'll note a lot of nuance and the load out 
really changes what looks right (down to the model of SSDs I have different 
expectations for p99s if it's a model I haven't used before I'll do some 
comparative testing).
The reason so much of this is general and vague is my selection bias. I'm 
brought in when people are complaining about performance or some grand systemic 
crash because they were monitoring nothing. I have little ability to change 
hardware initially so I have to be willing to allow the hardware to do the best 
it can an establish levels where it can no longer keep up with the customers 
goals. This may mean for some use cases 10 pending compactions is an actionable 
event for them, for another customer 100 is. The better approach is to 
establish a baseline for when these metrics start to indicate a serious issue 
is occurring in that particular app. Basically when people notice a problem, 
what did these numbers look like in the minutes, hours and days prior? That's 
the way to establish the levels consistently.
Regards,
Ryan Svihla







On Fri, Aug 26, 2016 at 4:48 AM -0500, "Thomas Julian"  
wrote:










Hello,

I am working on setting up a monitoring tool to monitor Cassandra Instances. 
Are there any wikis which specifies optimum value for each Cassandra KPIs?
For instance, I am not sure,
What value of "Memtable Columns Count" can be considered as "Normal". 
What value of the same has to be considered as "Critical".
I knew threshold numbers for few params, for instance any thing more than zero 
for timeouts, pending tasks should be considered as unusual. Also, I am aware 
that most of the statistics' threshold numbers vary in accordance with Hardware 
Specification, Cassandra Environment Setup. But, what I request here is a 
general guideline for configuring thresholds for all the metrics.

If this has been already covered, please point me to that resource. If anyone 
on their own interest collected these things, please share.

Any help is appreciated.

Best Regards,
Julian.

Re: Failure when setting up cassandra in cluster

2016-08-22 Thread Ryan Svihla

 instead of 127.0.0.1 have you tried just passing the IP of the one of the
nodes.

On Mon, Aug 22, 2016 at 9:45 AM Raimund Klein  wrote:

> Hello all,
>
> Sorry for reposting this, but I didn't receive any response. Can someone
> help please?
>
> -- Forwarded message --
> From: Raimund Klein 
> Date: 2016-08-15 12:07 GMT+01:00
> Subject: Failure when setting up cassandra in cluster
> To: user@cassandra.apache.org
>
>
> Hi all,
>
> Sorry if this is a fairly stupid question, but we've all only been exposed
> to Cassandra very recently.
>
> We're trying to configure a 2-node cluster with non-default credentials.
> Here's what I've been doing so far based on my understanding of the
> documentation. The platform is RHEL 7:
>
>
>1. Use an RPM I found with Datastax to perform a basic cassandra
>installation.
>2. Change the temporary directory in cassandra-env.sh, because nobody
>is allowed to execute anything in /tmp.
>3. In cassandra.yaml,
>- change the cluster_name
>- empty the listen_address entry
>- define both VMs as seeds
>4. Open port 7000 in the firewall.
>5. Start cassandra.
>6. In the cassandra.yaml, change to PasswordAuthenticator.
>7. Run cqlsh -u cassandra -p cassandra -e "ALTER KEYSPACE system_auth
>WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2
>};"
>8. Restart cassandra
>9. Perform 1-8 on the second node
>10. To create a new user, run cqlsh -u cassandra -p cassandra
>-e "CREATE USER ${CASSANDRA_USERNAME} WITH PASSWORD '${CASSANDRA_PASSWORD}'
>SUPERUSER;"
>
> Step 10 fails with this error:
>
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> AuthenticationFailed(u'Failed to authenticate to 127.0.0.1: code=0100
> [Bad credentials]
> message="org.apache.cassandra.exceptions.UnavailableException: Cannot
> achieve consistency level QUORUM"',)})
>
>
> What am I missing?
>
>
> Cheers
>
> Raimund
>
>
> --
Regards,

Ryan Svihla

Re: A question to updatesstables

2016-08-19 Thread Ryan Svihla

The actual error message could be very useful to diagnose the reason. There are 
warnings about incompatible formats which are safe to ignore (usually in the 
cache) and I have one time seen an issue with commit log archiving preventing a 
startup during upgrade. Usually there is something else broken and the version 
mismatch is a false signal.

Regards,

Ryan Svihla

> On Aug 18, 2016, at 10:18 PM, Lu, Boying  wrote:
> 
> Thanks a lot.
>  
> I’m a little bit of confusing.  If the ‘nodetool updatesstable’ doesn’t work 
> without Cassandra server running,
> and Cassandra server failed to start due to the incompatible SSTable format,  
> how to resolve this dilemma?
>  
>  
>  
> From: Carlos Alonso [mailto:i...@mrcalonso.com] 
> Sent: 2016年8月18日 18:44
> To: user@cassandra.apache.org
> Subject: Re: A question to updatesstables
>  
> Replies inline
> 
> Carlos Alonso | Software Engineer | @calonso
>  
> On 18 August 2016 at 11:56, Lu, Boying  wrote:
> Hi, All,
>  
> We use Cassandra in our product. I our early release we use Cassandra 1.2.10 
> whose SSTable is ‘ic’ format.
> We upgrade Cassandra to 2.0.10 in our product release. But the Cassandra 
> server failed to start due to the
> incompatible SSTable format and the log message told us to use ‘nodetool 
> updatesstables’ to upgrade SSTable files.
>  
> To make sure that no negative impact on our data, I want to confirm following 
> things about this command before trying it:
> 1.   Does it work without Cassandra server running?
> 
> No, it won't. 
> 2.   Will it cause data lost with this command?
> 
> It shouldn't if you followed the upgrade instructions properly
> 3.   What’s the best practice to void this error occurs again (e.g. 
> upgrading Cassandra next time)?
> 
> Upgrading SSTables is required or not depending on the upgrade you're 
> running, basically if the SSTables layout changes you'll need to run it and 
> not otherwise so there's nothing you can do to avoid it 
>  
> Thanks
>  
> Boying
>

Re: A question to updatesstables

2016-08-18 Thread Ryan Svihla

It hasn't ever prevented me from starting unless there was something else
going on. Can you share the log message that's preventing you from starting.

On Thu, Aug 18, 2016, 5:44 AM Carlos Alonso  wrote:

> Replies inline
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 18 August 2016 at 11:56, Lu, Boying  wrote:
>
>> Hi, All,
>>
>>
>>
>> We use Cassandra in our product. I our early release we use Cassandra
>> 1.2.10 whose SSTable is ‘ic’ format.
>>
>> We upgrade Cassandra to 2.0.10 in our product release. But the Cassandra
>> server failed to start due to the
>>
>> incompatible SSTable format and the log message told us to use ‘nodetool
>> updatesstables’ to upgrade SSTable files.
>>
>>
>>
>> To make sure that no negative impact on our data, I want to confirm
>> following things about this command before trying it:
>>
>> 1.   Does it work without Cassandra server running?
>>
> No, it won't.
>
>> 2.   Will it cause data lost with this command?
>>
> It shouldn't if you followed the upgrade instructions properly
>
>> 3.   What’s the best practice to void this error occurs again (e.g.
>> upgrading Cassandra next time)?
>>
> Upgrading SSTables is required or not depending on the upgrade you're
> running, basically if the SSTables layout changes you'll need to run it and
> not otherwise so there's nothing you can do to avoid it
>
>>
>>
>> Thanks
>>
>>
>>
>> Boying
>>
>
> --
Regards,

Ryan Svihla

Re: Replicating Cassandra data to HDFS

2016-08-09 Thread Ryan Svihla

Jon,

You know I've not actually spent the hour to read the ticket so I was just 
guessing it didn't handle dedup...all the same semantics apply though..you'd 
have to do a read before write and then allow some window of failure mode. 
Maybe if you were LWT everything but that sounds really slow...I'd be curious 
of your thoughts on how to do that well..maybe I'm missing something.

Regards,
Ryan Svihla

On Aug 9, 2016, 1:13 PM -0500, Jonathan Haddad , wrote:
> I'm having a hard time seeing how anyone would be able to work with CDC in 
> it's currently implementation of not doing any dedupe. Unless you really want 
> to write all your own logic for that including failure handling + a 
> distributed state machine I wouldn't count on it as a solution.
> On Tue, Aug 9, 2016 at 10:49 AM Ryan Svihla  (mailto:r...@foundev.pro)> wrote:
> > You can follow the monster of a ticket 
> > https://issues.apache.org/jira/browse/CASSANDRA-8844 and see if it looks 
> > like the tradeoffs there are headed in the right direction for you.
> >
> > even CDC I think would have the logically same issue of not deduping for 
> > you as triggers and dual write due to replication factor and consistently 
> > level issues. Otherwise you'd be stuck doing an all replica comparison when 
> > a late event came in and when a node was down what would you do then? what 
> > if one replica got it as well and then came on line much later? Even if you 
> > were using a single source of truth style database, you'll find failover 
> > has a way of losing late events anyway (due to async replication) not to 
> > mention once you go multiple dc it's all a matter of what DC you're in.
> >
> > Anyway for the cold storage I think a trailing amount that is just greater 
> > than your old events would do it. IE if you choose to only accept 30 days 
> > out then cold storage for 32 days. At some point there is no free lunch as 
> > you point out when replicating between two data sources. ie CDC, triggers 
> > really anything that marks a "new event" will have the same problem and 
> > you'll have to choose an acceptable level of lateness or check for lateness 
> > indefinitely.
> >
> > Alternatively you can just accept duplication and handle it cold storage 
> > read side (like event sourcing pattern, this would be ideal if the lateness 
> > is uncommon) or clean it up over time in cold storage as it's detected 
> > (similar to an event sourcing pattern, but snapshotting data down to a 
> > single record when you encounter it on a read).
> >
> > Best of luck, this is a corner case that requires hard tradeoffs in all 
> > technology I've encountered.
> >
> > Regards,
> > Ryan Svihla
> >
> >
> >
> > On Aug 9, 2016, 12:21 PM -0500, Ben Vogan  > (mailto:b...@shopkick.com)>, wrote:
> > > Thanks Ryan. I was hoping there was a change data capture framework. We 
> > > have late arriving events, some of which can be very late. We would have 
> > > to batch collect data for a large time period every so often to go back 
> > > and collect those or accept that we are going to lose a small percentage 
> > > of events. Neither of which is ideal.
> > >
> > > On Tue, Aug 9, 2016 at 10:30 AM, Ryan Svihla  > > (mailto:r...@foundev.pro)> wrote:
> > > > The typical pattern I've seen in the field is kafka + consumers for 
> > > > each destination (variant of dual write I know), this of course would 
> > > > not work for your goal of relying on C* for dedup. Triggers would also 
> > > > suffer the same problem unfortunately so you're really left with a 
> > > > batch job (most likely Spark) to move data from C* into HDFS on a given 
> > > > interval. If this is really a cold storage use case that can work quite 
> > > > well especially assuming you've modeled your data as a time series or 
> > > > with some sort of time based bucketing so you can quickly get full 
> > > > partitions data out of C* in a deterministic fashion and not have to 
> > > > scan your entire data set.
> > > >
> > > > I've also for similar needs have seen Spark streaming + querying 
> > > > cassandra for duplication checks to dedup then output to another source 
> > > > (form of dual write but with dedup), this was really silly and slow. I 
> > > > only bring it up to save you the trouble in case you end up in the same 
> > > > path chasing for something more 'real time'.
> > >

Re: Replicating Cassandra data to HDFS

2016-08-09 Thread Ryan Svihla

You can follow the monster of a ticket 
https://issues.apache.org/jira/browse/CASSANDRA-8844 and see if it looks like 
the tradeoffs there are headed in the right direction for you.

even CDC I think would have the logically same issue of not deduping for you as 
triggers and dual write due to replication factor and consistently level 
issues. Otherwise you'd be stuck doing an all replica comparison when a late 
event came in and when a node was down what would you do then? what if one 
replica got it as well and then came on line much later? Even if you were using 
a single source of truth style database, you'll find failover has a way of 
losing late events anyway (due to async replication) not to mention once you go 
multiple dc it's all a matter of what DC you're in.

Anyway for the cold storage I think a trailing amount that is just greater than 
your old events would do it. IE if you choose to only accept 30 days out then 
cold storage for 32 days. At some point there is no free lunch as you point out 
when replicating between two data sources. ie CDC, triggers really anything 
that marks a "new event" will have the same problem and you'll have to choose 
an acceptable level of lateness or check for lateness indefinitely.

Alternatively you can just accept duplication and handle it cold storage read 
side (like event sourcing pattern, this would be ideal if the lateness is 
uncommon) or clean it up over time in cold storage as it's detected (similar to 
an event sourcing pattern, but snapshotting data down to a single record when 
you encounter it on a read).

Best of luck, this is a corner case that requires hard tradeoffs in all 
technology I've encountered.

Regards,
Ryan Svihla

On Aug 9, 2016, 12:21 PM -0500, Ben Vogan , wrote:
> Thanks Ryan. I was hoping there was a change data capture framework. We have 
> late arriving events, some of which can be very late. We would have to batch 
> collect data for a large time period every so often to go back and collect 
> those or accept that we are going to lose a small percentage of events. 
> Neither of which is ideal.
>
> On Tue, Aug 9, 2016 at 10:30 AM, Ryan Svihla  (mailto:r...@foundev.pro)> wrote:
> > The typical pattern I've seen in the field is kafka + consumers for each 
> > destination (variant of dual write I know), this of course would not work 
> > for your goal of relying on C* for dedup. Triggers would also suffer the 
> > same problem unfortunately so you're really left with a batch job (most 
> > likely Spark) to move data from C* into HDFS on a given interval. If this 
> > is really a cold storage use case that can work quite well especially 
> > assuming you've modeled your data as a time series or with some sort of 
> > time based bucketing so you can quickly get full partitions data out of C* 
> > in a deterministic fashion and not have to scan your entire data set.
> >
> > I've also for similar needs have seen Spark streaming + querying cassandra 
> > for duplication checks to dedup then output to another source (form of dual 
> > write but with dedup), this was really silly and slow. I only bring it up 
> > to save you the trouble in case you end up in the same path chasing for 
> > something more 'real time'.
> >
> > Regards,
> > Ryan Svihla
> >
> >
> > On Aug 9, 2016, 11:09 AM -0500, Ben Vogan  > (mailto:b...@shopkick.com)>, wrote:
> > > Hi all,
> > >
> > > We are investigating using Cassandra in our data platform. We would like 
> > > data to go into Cassandra first and to eventually be replicated into our 
> > > data lake in HDFS for long term cold storage. Does anyone know of a good 
> > > way of doing this? We would rather not have parallel writes to HDFS and 
> > > Cassandra because we were hoping that we could use Cassandra primary keys 
> > > to de-duplicate events.
> > >
> > > Thanks, --
> > >
> > > BENJAMIN VOGAN | Data Platform Team Lead
> > > shopkick (http://www.shopkick.com/)
> > > The indispensable app that rewards you for shopping.
>
>
> --
>
> BENJAMIN VOGAN | Data Platform Team Lead
> shopkick (http://www.shopkick.com/)
> The indispensable app that rewards you for shopping.

Re: Replicating Cassandra data to HDFS

2016-08-09 Thread Ryan Svihla

The typical pattern I've seen in the field is kafka + consumers for each 
destination (variant of dual write I know), this of course would not work for 
your goal of relying on C* for dedup. Triggers would also suffer the same 
problem unfortunately so you're really left with a batch job (most likely 
Spark) to move data from C* into HDFS on a given interval. If this is really a 
cold storage use case that can work quite well especially assuming you've 
modeled your data as a time series or with some sort of time based bucketing so 
you can quickly get full partitions data out of C* in a deterministic fashion 
and not have to scan your entire data set.

I've also for similar needs have seen Spark streaming + querying cassandra for 
duplication checks to dedup then output to another source (form of dual write 
but with dedup), this was really silly and slow. I only bring it up to save you 
the trouble in case you end up in the same path chasing for something more 
'real time'.

Regards,
Ryan Svihla

On Aug 9, 2016, 11:09 AM -0500, Ben Vogan , wrote:
> Hi all,
>
> We are investigating using Cassandra in our data platform. We would like data 
> to go into Cassandra first and to eventually be replicated into our data lake 
> in HDFS for long term cold storage. Does anyone know of a good way of doing 
> this? We would rather not have parallel writes to HDFS and Cassandra because 
> we were hoping that we could use Cassandra primary keys to de-duplicate 
> events.
>
> Thanks, --
>
> BENJAMIN VOGAN | Data Platform Team Lead
> shopkick (http://www.shopkick.com/)
> The indispensable app that rewards you for shopping.

Re: a solution of getting cassandra cross-datacenter latency at a certain time

2016-08-08 Thread Ryan Svihla

The first issue I can think of is the Latency table, if I understand you 
correctly, has an unbounded size for the partition key of DC and will over time 
just get larger as more measurements are recorded.

Regards,

Ryan Svihla

> On Aug 8, 2016, at 2:58 AM, Stone Fang  wrote:
> 
> objective:get cassandra cross-datacenter latency in time
> 
> existing ticket:
> 
> there is a ticket [track cross-datacenter 
> latency](https://issues.apache.org/jira/browse/CASSANDRA-11569)
> but it is a statistics value from node starting,i want to get the 
> instantaneous value in a certain time.
> 
> thought 
> 
> want to write a message into **MESSAGE TABLE** in 1s timer task(the period is 
> similar to most of cross datacenter latency ) 
> ,and replicate to other datacenter,there will be a delay.and I capture it,and 
> write to **LATENCY TABLE**.i can query the latency value from this table with 
> the condition of certain time.
> 
> schema
> 
> message table for replicating data cross datacenter
>
> 
> create keyspace heartbeat with replication=
> {'class':'NetworkTopologyStrategy','dc1':1, 'dc2':1...};
> 
>
> 
>  CREATE TABLE HEARTBEAT.MESSAGE{
> CREATED TIMESTAMP,
> FROMDC VARCHAR,
> PRIMARY KEY(CREATED,FROMDC)
> }
> 
> latency Table for querying latency value  
> 
>  CREATE TABLE SYSTEM.LATENCY{
>  FROMDC VARCHAR,
>  ARRIVED TIMESTAMP,
>  CREATED TIMESTAMP,
>  LANTENCY BIGINT 
>  PRIMARY KEY(FROMDC,ARRIVED)
> }WITH CLUSTERING ORDER BY(ARRIVED DESC);
> 
> problems
> 
> 1.can this solution work to get the cross-datacenter latency?
> 
> 
> 2.create heartbeat keyspace in cassandra bootstrap process,i need to load 
> Heartbeat keyspace in Scheam.java.and save this keyspace into SystemSchema.
> also need to check if this keyspace has exist after first node start.so i
> think this is not a good solution.
> 
> 3.compared to 1,try another solution.generate heartbeat message in a 
> standalone jar.but always i need to capture heartbeat message mutation in 
> cassandra.so i need to check if the mutation is about heartbeat message.and 
> it seems strange that check the heartbeat keyspace which is not defined in 
> cassandra,but third-party.
> 
> hope to see your thought on this.
> thanks
> stone
>

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Ryan Svihla

On a related note, I still need to file a Jira just to make it easier to find 
large cells in general, I've had 2 customers now with a bunch of 10mb+ writes 
(single cell) they weren't expecting and tracking that down is equally 
challenging (Spark in both cases made it doable, but slow to find).

Regards,

Ryan Svihla

> On Aug 3, 2016, at 4:21 PM, Jonathan Haddad  wrote:
> 
> I haven't verified, so i'm not 100% certain, but I believe you'd get back an 
> exception to the client.  Yes, this belongs in the DB, but I don't think 
> you're totally blind to what went wrong.
> 
> My guess is this exception in the Python driver (but other drivers should 
> have a similar exception): 
> https://github.com/datastax/python-driver/blob/master/cassandra/protocol.py#L288
> 
>> On Wed, Aug 3, 2016 at 1:59 PM Ryan Svihla  wrote:
>> Made a Jira about it already 
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231
>> 
>> Regards,
>> 
>> Ryan Svihla
>> 
>>> On Aug 3, 2016, at 2:58 PM, Kevin Burton  wrote:
>>> 
>>> It seems these are basically impossible to track down.  
>>> 
>>> https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
>>> 
>>> has some information but their work around is to increase the transaction 
>>> log.  There's no way to find out WHAT client or what CQL is causing the 
>>> large mutation.
>>> 
>>> Any thoughts on how to mitigate this?
>>> 
>>> Kevin
>>> 
>>> -- 
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations 
>>> Engineers!
>>> 
>>> Founder/CEO Spinn3r.com
>>> Location: San Francisco, CA
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>>

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Ryan Svihla

Where I see this a lot is:

1. DBA notices it in logs
2. Everyone says code works fine no errors
3. Weeks of combing all apps find out 3 teams are doing fire and forget 
futures...
4. Convince each team they really need to handle futures
5. Couple months before you figure out who was the culprit by the time he 
deploys hit production.

Would save everyone a ton of brain cells if we just logged it.

Regards,

Ryan Svihla

> On Aug 3, 2016, at 4:21 PM, Jonathan Haddad  wrote:
> 
> I haven't verified, so i'm not 100% certain, but I believe you'd get back an 
> exception to the client.  Yes, this belongs in the DB, but I don't think 
> you're totally blind to what went wrong.
> 
> My guess is this exception in the Python driver (but other drivers should 
> have a similar exception): 
> https://github.com/datastax/python-driver/blob/master/cassandra/protocol.py#L288
> 
>> On Wed, Aug 3, 2016 at 1:59 PM Ryan Svihla  wrote:
>> Made a Jira about it already 
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231
>> 
>> Regards,
>> 
>> Ryan Svihla
>> 
>>> On Aug 3, 2016, at 2:58 PM, Kevin Burton  wrote:
>>> 
>>> It seems these are basically impossible to track down.  
>>> 
>>> https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
>>> 
>>> has some information but their work around is to increase the transaction 
>>> log.  There's no way to find out WHAT client or what CQL is causing the 
>>> large mutation.
>>> 
>>> Any thoughts on how to mitigate this?
>>> 
>>> Kevin
>>> 
>>> -- 
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations 
>>> Engineers!
>>> 
>>> Founder/CEO Spinn3r.com
>>> Location: San Francisco, CA
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>>

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Ryan Svihla

Made a Jira about it already 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231

Regards,

Ryan Svihla

> On Aug 3, 2016, at 2:58 PM, Kevin Burton  wrote:
> 
> It seems these are basically impossible to track down.  
> 
> https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
> 
> has some information but their work around is to increase the transaction 
> log.  There's no way to find out WHAT client or what CQL is causing the large 
> mutation.
> 
> Any thoughts on how to mitigate this?
> 
> Kevin
> 
> -- 
> We’re hiring if you know of any awesome Java Devops or Linux Operations 
> Engineers!
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
>

Re: Read gets stale data after failure of commit phase in CAS operation

2016-07-24 Thread Ryan Svihla

are you using one of the SERIAL Consistency Levels?

-- 
Ryan Svihla

On July 24, 2016 at 8:08:01 PM, Yuji Ito (y...@imagine-orb.com) wrote:

> Hi,
>
> I have another question about CAS operation.
>
> Can a read get stale data after failure in commit phase?
>
> According to the following article,
> when a write fails in commit phase (a WriteTimeout with WriteType SIMPLE
> happens),
> a subsequent read will repair the uncommitted state
> and get the latest data which the write tried to update.
>
> Ref. http://www.datastax.com/dev/blog/cassandra-error-handling-done-right
>
> However, when I tested the process,
> the subsequent read can get the previous data which should be overwritten.
>
> Is the repair not always executed in the read?
>
> Thanks,
> Yuji
>

Re: My cluster shows high system load without any apparent reason

2016-07-22 Thread Ryan Svihla

You aren't using counters by chance?

regards,

Ryan Svihla

On Jul 22, 2016, 2:00 PM -0500, Mark Rose , wrote:
> Hi Garo,
>
> Are you using XFS or Ext4 for data? XFS is much better at deleting
> large files, such as may happen after a compaction. If you have 26 TB
> in just two tables, I bet you have some massive sstables which may
> take a while for Ext4 to delete, which may be causing the stalls. The
> underlying block layers will not show high IO-wait. See if the stall
> times line up with large compactions in system.log.
>
> If you must use Ext4, another way to avoid issues with massive
> sstables is to run more, smaller instances.
>
> As an aside, for the amount of reads/writes you're doing, I've found
> using c3/m3 instances with the commit log on the ephemeral storage and
> data on st1 EBS volumes to be much more cost effective. It's something
> to look into if you haven't already.
>
> -Mark
>
> On Fri, Jul 22, 2016 at 8:10 AM, Juho Mäkinen  wrote:
> > After a few days I've also tried disabling Linux kernel huge pages
> > defragement (echo never > /sys/kernel/mm/transparent_hugepage/defrag) and
> > turning coalescing off (otc_coalescing_strategy: DISABLED), but either did
> > do any good. I'm using LCS, there are no big GC pauses, and I have set
> > "concurrent_compactors: 5" (machines have 16 CPUs), but there are usually
> > not any compactions running when the load spike comes. "nodetool tpstats"
> > shows no running thread pools except on the Native-Transport-Requests
> > (usually 0-4) and perhaps ReadStage (usually 0-1).
> >
> > The symptoms are the same: after about 12-24 hours increasingly number of
> > nodes start to show short CPU load spikes and this affects the median read
> > latencies. I ran a dstat when a load spike was already under way (see
> > screenshot http://i.imgur.com/B0S5Zki.png), but any other column than the
> > load itself doesn't show any major change except the system/kernel CPU
> > usage.
> >
> > All further ideas how to debug this are greatly appreciated.
> >
> >
> > On Wed, Jul 20, 2016 at 7:13 PM, Juho Mäkinen  > wrote:
> > >
> > > I just recently upgraded our cluster to 2.2.7 and after turning the
> > > cluster under production load the instances started to show high load (as
> > > shown by uptime) without any apparent reason and I'm not quite sure what
> > > could be causing it.
> > >
> > > We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four
> > > 800GB SSDs (set as lvm stripe into one big lvol). Running 
> > > 3.13.0-87-generic
> > > on HVM virtualisation. Cluster has 26 TiB of data stored in two tables.
> > >
> > > Symptoms:
> > > - High load, sometimes up to 30 for a short duration of few minutes, then
> > > the load drops back to the cluster average: 3-4
> > > - Instances might have one compaction running, but might not have any
> > > compactions.
> > > - Each node is serving around 250-300 reads per second and around 200
> > > writes per second.
> > > - Restarting node fixes the problem for around 18-24 hours.
> > > - No or very little IO-wait.
> > > - top shows that around 3-10 threads are running on high cpu, but that
> > > alone should not cause a load of 20-30.
> > > - Doesn't seem to be GC load: A system starts to show symptoms so that it
> > > has ran only one CMS sweep. Not like it would do constant stop-the-world
> > > gc's.
> > > - top shows that the C* processes use 100G of RSS memory. I assume that
> > > this is because cassandra opens all SSTables with mmap() so that they will
> > > pop up in the RSS count because of this.
> > >
> > > What I've done so far:
> > > - Rolling restart. Helped for about one day.
> > > - Tried doing manual GC to the cluster.
> > > - Increased heap from 8 GiB with CMS to 16 GiB with G1GC.
> > > - sjk-plus shows bunch of SharedPool workers. Not sure what to make of
> > > this.
> > > - Browsed over
> > > https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html but 
> > > didn't
> > > find any apparent
> > >
> > > I know that the general symptom of "system shows high load" is not very
> > > good and informative, but I don't know how to better describe what's going
> > > on. I appreciate all ideas what to try and how to debug this further.
> > >
> > > - Garo
> > >
> >

Re: Questions about anti-entropy repair

2016-07-22 Thread Ryan Svihla

I would say only repairing when there is a known problem has a couple of 
logical issues off the top of my head:

1. you're assuming hints are successfully delivering within their time window. 
There isn't really any indication that I've ever found myself.
2. unless you're using CL ALL you really have no indication if the other 
replicas not needed didn't succeed a write on the initial attempt.

Now if you're using CL LOCAL_QUORUM you'll have reasonable consistency and 
chances are pretty good that you eventually hit your RF anyway with 
read_repair, so I get the thought process behind what you're saying Daemeon.

Likewise, I've seen well sized clusters with steady good workloads in general 
behave pretty well and not need to stream a lot of data during repair, but 
because of 1 and 2 even with good monitoring that's a bit "running with 
scissors" for my taste as I'm not confident there is enough monitoring coverage 
that'll ever guarantee you're "mostly meeting RF" or not.

Running repair within gc_grace_seconds should be something you can handle 
anyway with your workload or you're not sized correctly (else what happens when 
you need to run repair after a major event?), so why not just keep it running.

YMMV and if someone has kept their cluster up and running and know all the 
stuff to look for Kudos. I still view it as a cheap cost to CYA and even 
working with Cassandra now for 3 years in a wide variety of pretty crazy 
situations I'm not confident I could keep a cluster healthy without running 
repair consistently.

regards,

Ryan Svihla

On Jul 20, 2016, 10:32 AM -0500, daemeon reiydelle , wrote:
> I don't know if my perspective on this will assist, so YMMV:
>
> Summary
> Nodetool repairs are required when a node has issues and can't get its (e.g. 
> hinted handoff) resync done: culprit: usually network, sometimes 
> container/vm, rarely disk.
> Scripts to do partition range are a pain to maintain, and you have to be 
> CONSTANTLY checking for new keyspaces, parsing them, etc. Git hub project?
> Monitor/monitor/monitor: if you do a best practices job of actually 
> monitoring the FULL stack, you only need to do repairs when the world goes 
> south.
> Are you alerted when errors show up in the logs, network goes wacky, etc? No? 
> then you have to CYA by doing hail mary passes with periodic nodetool repairs.
> Nodetool repair is a CYA for a cluster whose status is not well monitored.
> Daemeon's thoughts:
>
> Nodetool repair is not required for a cluster that is and "always has been" 
> in a known good state. Monitoring of the relevant logs/network/disk/etc. is 
> the only way that I know of to assure this state. Because (e.g. AWS, and 
> EVERY ONE OF my clients' infrastructures: screwed up networks) nodes can 
> disappear then the cluster *can* get overloaded (network traffic) causing 
> hinted handoffs to have all of the worst case corner cases you can never hope 
> to see.
>
> So, if you have good monitoring in place to assure that there is known good 
> cluster behaviour (network, disk, etc.), repairs are not required until you 
> are alerted that a cluster health problem has occurred. Partition range 
> repair is a pain in various parts of the anatomy because one has to 
> CONSTANTLY be updating the scripts that generate the commands (I have not 
> seen a git hub project around this, would love to see responses that point 
> them out!).
>
>
>
> ...
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198 (tel:(+1)%20415.501.0198)
> London (+44) (0) 20 8144 9872 (tel:(+44)%2020%208144%209872)
>
> On Wed, Jul 20, 2016 at 4:33 AM, Alain RODRIGUEZ  (mailto:arodr...@gmail.com)> wrote:
> > Hi Satoshi,
> >
> > > Q1:
> > > According to the DataStax document, it's recommended to run full repair 
> > > weekly or monthly. Is it needed even if repair with partitioner range 
> > > option ("nodetool repair -pr", in C* v2.2+) is set to run periodically 
> > > for every node in the cluster?
> >
> >
> > More accurately you need to run a repair for each node and each table 
> > within the gc_grace_seconds value defined at the table level to ensure no 
> > deleted data will return. Also running this on a regular basis ensure a 
> > constantly low entropy in your cluster, allowing better consistency (if not 
> > using a strong consistency like with CL.R&W = quorum).
> >
> > A full repair means every piece of data have been repaired. On a 3 node 
> > cluster with RF=3, running 'nodetool repair -pr' on the 3 nodes or 
> > 'nodetool repair' on one node are an equivalent "full repair". The best 
> >

Re: Is my cluster normal?

2016-07-07 Thread Ryan Svihla

what version of cassandra and java?

Regards,

Ryan Svihla

> On Jul 7, 2016, at 4:51 PM, Yuan Fang  wrote:
> 
> Yes, here is my stress test result:
> Results:
> op rate   : 12200 [WRITE:12200]
> partition rate: 12200 [WRITE:12200]
> row rate  : 12200 [WRITE:12200]
> latency mean  : 16.4 [WRITE:16.4]
> latency median: 7.1 [WRITE:7.1]
> latency 95th percentile   : 38.1 [WRITE:38.1]
> latency 99th percentile   : 204.3 [WRITE:204.3]
> latency 99.9th percentile : 465.9 [WRITE:465.9]
> latency max   : 1408.4 [WRITE:1408.4]
> Total partitions  : 100 [WRITE:100]
> Total errors  : 0 [WRITE:0]
> total gc count: 0
> total gc mb   : 0
> total gc time (s) : 0
> avg gc time(ms)   : NaN
> stdev gc time(ms) : 0
> Total operation time      : 00:01:21
> END
> 
>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla  wrote:
>> Lots of variables you're leaving out.
>> 
>> Depends on write size, if you're using logged batch or not, what consistency 
>> level, what RF, if the writes come in bursts, etc, etc. However, that's all 
>> sort of moot for determining "normal" really you need a baseline as all 
>> those variables end up mattering a huge amount.
>> 
>> I would suggest using Cassandra stress as a baseline and go from there 
>> depending on what those numbers say (just pick the defaults).
>> 
>> Sent from my iPhone
>> 
>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang  wrote:
>>> 
>>> yes, it is about 8k writes per node.
>>> 
>>> 
>>> 
>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle  
>>>> wrote:
>>>> Are you saying 7k writes per node? or 30k writes per node?
>>>> 
>>>> 
>>>> ...
>>>> 
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198
>>>> London (+44) (0) 20 8144 9872
>>>> 
>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang  wrote:
>>>>> writes 30k/second is the main thing.
>>>>> 
>>>>> 
>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle  
>>>>>> wrote:
>>>>>> Assuming you meant 100k, that likely for something with 16mb of storage 
>>>>>> (probably way small) where the data is more that 64k hence will not fit 
>>>>>> into the row cache.
>>>>>> 
>>>>>> 
>>>>>> ...
>>>>>> 
>>>>>> Daemeon C.M. Reiydelle
>>>>>> USA (+1) 415.501.0198
>>>>>> London (+44) (0) 20 8144 9872
>>>>>> 
>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang  wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB 
>>>>>>> ssd EBS).
>>>>>>> I can reach a cluster wide write requests of 30k/second and read 
>>>>>>> request about 100/second. The cluster OS load constantly above 10. Are 
>>>>>>> those normal?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Yuan 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>

Re: Is my cluster normal?

2016-07-07 Thread Ryan Svihla

Lots of variables you're leaving out.

Depends on write size, if you're using logged batch or not, what consistency 
level, what RF, if the writes come in bursts, etc, etc. However, that's all 
sort of moot for determining "normal" really you need a baseline as all those 
variables end up mattering a huge amount.

I would suggest using Cassandra stress as a baseline and go from there 
depending on what those numbers say (just pick the defaults).

Sent from my iPhone

> On Jul 7, 2016, at 4:39 PM, Yuan Fang  wrote:
> 
> yes, it is about 8k writes per node.
> 
> 
> 
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle  wrote:
>> Are you saying 7k writes per node? or 30k writes per node?
>> 
>> 
>> ...
>> 
>> Daemeon C.M. Reiydelle
>> USA (+1) 415.501.0198
>> London (+44) (0) 20 8144 9872
>> 
>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang  wrote:
>>> writes 30k/second is the main thing.
>>> 
>>> 
 On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle  
 wrote:
 Assuming you meant 100k, that likely for something with 16mb of storage 
 (probably way small) where the data is more that 64k hence will not fit 
 into the row cache.
 
 
 ...
 
 Daemeon C.M. Reiydelle
 USA (+1) 415.501.0198
 London (+44) (0) 20 8144 9872
 
> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang  wrote:
> 
> 
> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB 
> ssd EBS).
> I can reach a cluster wide write requests of 30k/second and read request 
> about 100/second. The cluster OS load constantly above 10. Are those 
> normal?
> 
> Thanks!
> 
> 
> Best,
> 
> Yuan 
>

Re: What is the best way to model this JSON ??

2016-03-28 Thread Ryan Svihla

Lokesh,

The modeling will change a bit depending on your queries, the rate of update 
and your tooling (Spring-data-cassandra makes a mess of updating collections 
for example).  I suggest asking the Cassandra users mailing list for help since 
this list is for development OF Cassandra.

> On Mar 28, 2016, at 11:09 AM, Lokesh Ceeba - Vendor 
>  wrote:
> 
> Hello Team,
>   How to design/develop the best data model for this ?
> 
> 
> var json=[{ "id":"9a55fdf6-eeab-4c83-9c6f-04c7df1b3225",
>"user":"ssatish",
>"event":"business",
>"occurredOn":"09 Mar 2016 17:55:15.292-0600",
>"eventObject":
>{
>"objectType":"LOAD",
>"id":"12345",
>"state":"ARRIVAL",
>"associatedAttrs":
>[
>{
>
> "type":"location_id",
>"value":"100"
>},
>{
>
> "type":"location_type",
>"value":"STORE"
>},
>{
>
> "type":"arrival_ts",
>
> "value":"2015-12-12T10:10:10"
>}
>]
> } }]
> 
> 
> I've taken this approach :
> 
> create type event_object_0328
> (
> Object_Type text,
> Object_ID   Int,
> Object_State text
> )
> ;
> 
> 
> create table Events
> (
> event_id   timeuuid,
> event_type text,
> triggered_by   text,
> triggered_ts   timestamp,
> Appl_IDtext,
> eventObjectfrozen,
> primary key(event_id)
> )
> ;
> 
> Now I need to build the Associated Attributes (Highlighted above in JSON 
> text). The Associated Attributes can be very dynamic and shall come in any 
> (Key,Value) pair combination.
> 
> 
> 
> 
> --
> Lokesh
> 
> This email and any files transmitted with it are confidential and intended 
> solely for the individual or entity to whom they are addressed. If you have 
> received this email in error destroy it immediately. *** Walmart Confidential 
> ***

Re: Keyspaces not found in cqlsh

2016-02-11 Thread Ryan Svihla

Kedar,

I recommend asking the user list user@cassandra.apache.org this list is for the 
development of cassandra and you're more likely to find someone on the user 
list who may have hit this issue.

Curious issue though I haven't seen that myself.

Regards,
Ryan Svihla

> On Feb 11, 2016, at 7:56 AM, kedar  wrote:
> 
> Dev Team,
> 
> Need some help with a burning cqlsh issue
> 
> I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see
> / desc keyspaces and query tables through cqlsh on either of the two nodes
> 
> cqlsh> desc keyspaces
> 
> 
> 
> cqlsh> use user_index;
> cqlsh:user_index> desc table list_1_10;
> 
> Keyspace 'user_index' not found.
> cqlsh:user_index>
> cqlsh>  select * from system.schema_keyspaces;
> Keyspace 'system' not found.
> cqlsh>
> We are running a 2 node cluster. The Python - Django app that inserts
> data is running without any failure and system logs show nothing abnormal.
> 
> ./nodetool repair on one node hasn't helped ./nodetool cfstats shows all
> the tables too
> 
> ls -l cassandra/data/*  on each node:
> 
> https://gist.github.com/anonymous/3dddbe728a52c07d7c52
> https://gist.github.com/anonymous/302ade0875dd6410087b
> 
> 
> 
> 
> --
> Thanks,
> Kedar Parikh
> 
> 
> 
> 
> 
> 
> 
> 
>

Re: Missing rows while scanning table using java driver

2016-02-02 Thread Ryan Svihla

Priyanka,

This is a better question for the Cassandra user mailing list (cc’d above) 
which is where many experts in the use of Cassandra are subscribed, where as 
this list is more about improving or changing Cassandra itself.

As to your issue, there can be many combined issues at once that are leading to 
this situation, can I suggest you respond on the user list with the following:

- Keyspace (RF especially), data center and table configuration.
- Any errors in the logs on the Cassandra nodes.

Regards,

Ryan Svihla

> On Feb 2, 2016, at 4:58 AM, Priyanka Gugale  wrote:
> 
> I am using query of the form: select * from %t where token(%p) > %s limit
> %l;
> 
> where t=tablename, %p=primary key, %s=token value of primary key and l=limit
> 
> -Priyanka
> 
> On Mon, Feb 1, 2016 at 6:19 PM, Priyanka Gugale  wrote:
> 
>> Hi,
>> 
>> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan
>> a table as per suggestions given here
>> <http://www.myhowto.org/bigdata/2013/11/04/scanning-the-entire-cassandra-column-family-with-cql/>,
>> On running the code to fetch records from table, it fetches different
>> number of records on each run. Some times it reads all records from table,
>> and some times some records are missing. As I have observed there is no
>> fixed pattern for missing records.
>> 
>> I have tried to set consistency level to ALL while running select query
>> still I couldn't fetch all records. Is there any known issue? Or am I
>> suppose to do anything more than running simple "select" statement.
>> 
>> Code snippet to fetch data:
>> 
>> SimpleStatement stmt = new SimpleStatement(query);
>> stmt.setConsistencyLevel(ConsistencyLevel.ALL);
>> ResultSet result = session.execute(stmt);
>> if (!result.isExhausted()) {
>>   for (Row row : result) {
>> process(row);
>>   }
>> }
>> 
>> -Priyanka
>>

Re: Modeling nested collection with C* 2.0

2016-01-28 Thread Ryan Svihla

Ahmed,

Just using text and serializing as Json is the easy way and a common approach.

However, this list is for Cassandra commiter discussion, please be so kind as 
to use the regular user list for data modeling questions or for any future 
responses to this email thread.


Regards,
Ryan Svihla

> On Jan 28, 2016, at 7:28 AM, Ahmed Eljami  wrote:
> 
> Hi,
> 
> I need your help for modeling a nested collection with cassanrda2.0 (UDT no,
> no fozen)
> 
> My users table contains emails by type, each type of email contains multiple
> emails.
> 
> Example:
> Type: pro. emails: {a...@mail.com, b...@mail.com ...}
> 
> Type: private. emails: {c...@mail.com, d...@mail.com}
> .
> 
> The user table also contains addresses, address type with fields.
> 
> Example:
> 
> Type: Pro. address {Street= aaa, number = 123, apartment = bbb}
> 
> Type: Private. address {Street = bbb, number = 123, apartment = kkk }
> 
> I am looking for a solution to store all these columns in one table.
> 
> Thank you.

Re: Help diagnosing performance issue

2015-11-30 Thread Ryan Svihla

wGen size difference comes from the number of cores:
>>>> the 64G machine has 12 cores where the 32G machine has 8 cores (I
>>>> did not even realize this before looking into this, that's why I did
>>>> not mention it in my previous emails).
>>>>
>>>> Thanks a lot for your help.
>>>>
>>>> A.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> All the best,
>>>>
>>>>
>>>> datastax_logo.png <http://www.datastax.com/>
>>>>
>>>> Sebastián Estévez
>>>>
>>>> Solutions Architect |954 905 8615  |
>>>> sebastian.este...@datastax.com
>>>> <mailto:sebastian.este...@datastax.com>
>>>> <mailto:sebastian.este...@datastax.com
>>>> <mailto:sebastian.este...@datastax.com>>
>>>>
>>>> linkedin.png
>>>> <https://www.linkedin.com/company/datastax>facebook.png
>>>> <https://www.facebook.com/datastax>twitter.png
>>>> <https://twitter.com/datastax>g+.png
>>>>
>>>> <https://plus.google.com/+Datastax/about><
>>>> http://feeds.feedburner.com/datastax>
>>>>
>>>>
>>>> <http://goog_410786983>
>>>>
>>>>
>>>> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>>>>
>>>>
>>>> DataStax is the fastest, most scalable distributed database
>>>> technology,
>>>> delivering Apache Cassandra to the world’s most innovative
>>>> enterprises.
>>>> Datastax is built to be agile, always-on, and predictably
>>>> scalable to
>>>> any size. With more than 500 customers in 45 countries, DataStax
>>>> is the
>>>> database technology and transactional backbone of choice for the
>>>> worlds
>>>> most innovative companies such as Netflix, Adobe, Intuit, and
>>>> eBay.
>>>>
>>>> On Wed, Nov 18, 2015 at 6:44 AM, Antoine Bonavita
>>>> mailto:anto...@stickyads.tv>
>>>> <mailto:anto...@stickyads.tv <mailto:anto...@stickyads.tv>>>
>>>> wrote:
>>>>
>>>>  Sebastian, Robet,
>>>>
>>>>  First, a big thank you to both of you for your help.
>>>>
>>>>  It looks like you were right. I used pcstat (awesome tool,
>>>> thanks
>>>>  for that as well) and it appears some files I would not
>>>> expect to be
>>>>  in cache actually are. Here is a sample of my output
>>>> (edited for
>>>>  convenience, adding the file timestamp from the OS):
>>>>
>>>>  *
>>>>
>>>>
>>>>
>>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5951-big-Data.db
>>>>
>>>>
>>>>  - 000.619 % - Nov 16 12:25
>>>>  *
>>>>
>>>>
>>>>
>>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5954-big-Data.db
>>>>
>>>>
>>>>  - 000.681 % - Nov 16 13:44
>>>>  *
>>>>
>>>>
>>>>
>>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5955-big-Data.db
>>>>
>>>>
>>>>  -  000.610 % - Nov 16 14:11
>>>>  *
>>>>
>>>>
>>>>
>>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5956-big-Data.db
>>>>
>>>>
>>>>  - 015.621 % - Nov 16 14:26
>>>>  *
>>>>
>>>>
>>>>
>>>> /var/lib/cassandra/data/views/views-451e4d8061ef11e5896f091196a360a0/la-5957-big-Data.db
>>>>
>>>>
>>>>  - 015.558 % - Nov 16 14:50
>>>>
>>>>  The SSTables that come before are all at about 0% and the
>>>> ones that
>>>>  come after it are all at about 15%.
>>>>
>>>>  As you can see the first SSTable at 15% date back from 24h.
>>>> Given my
>>>>  application I'm pretty sure those are not from the reads
>>>> (reads of
>>>>  data older than 1h is definitely under 0.1% of reads).
>>>> Could it be
>>>>  that compaction is putting those in cache constantly ?
>>>>  If so, then I'm probably confused on the meaning/effect of
>>>>  max_sstable_age_days (set at 10 in my case) and
>>>> base_time_seconds
>>>>  (not set in my case so the default of 3600 applies). I
>>>> would not
>>>>  expect any compaction to happen beyond the first hour and
>>>> the 10
>>>>  days is here to make sure data still gets expired and
>>>> SSTables
>>>>  removed (thus releasing disk space). I don't see where the
>>>> 24h come
>>>>  from.
>>>>  If you guys can shed some light on this, it would be
>>>> awesome. I'm
>>>>  sure I got something wrong.
>>>>
>>>>  Regarding the heap configuration, both are very similar:
>>>>  * 32G machine: -Xms8049M -Xmx8049M -Xmn800M
>>>>  * 64G machine: -Xms8192M -Xmx8192M -Xmn1200M
>>>>  I think we can rule that out.
>>>>
>>>>  Thanks again for you help, I truly appreciate it.
>>>>
>>>>  A.
>>>>
>>>>  On 11/17/2015 08:48 PM, Robert Coli wrote:
>>>>
>>>>  On Tue, Nov 17, 2015 at 11:08 AM, Sebastian Estevez
>>>>  >>> <mailto:sebastian.este...@datastax.com>
>>>>  <mailto:sebastian.este...@datastax.com
>>>> <mailto:sebastian.este...@datastax.com>>
>>>>  <mailto:sebastian.este...@datastax.com
>>>> <mailto:sebastian.este...@datastax.com>
>>>>  <mailto:sebastian.este...@datastax.com
>>>> <mailto:sebastian.este...@datastax.com>>>>
>>>>  wrote:
>>>>
>>>>   You're sstables are probably falling out of page
>>>> cache on the
>>>>   smaller nodes and your slow disks are killing your
>>>> latencies.
>>>>
>>>>
>>>>  +1 most likely.
>>>>
>>>>  Are the heaps the same size on both machines?
>>>>
>>>>  =Rob
>>>>
>>>>
>>>>  --
>>>>  Antoine Bonavita (anto...@stickyads.tv
>>>> <mailto:anto...@stickyads.tv>
>>>>  <mailto:anto...@stickyads.tv
>>>> <mailto:anto...@stickyads.tv>>) - CTO StickyADS.tv
>>>>  Tel: +33 6 34 33 47 36 
>>>> /+33 9 50
>>>>  68 21 32 
>>>>  NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN |
>>>> MADRID
>>>>
>>>>
>>>>
>>>> --
>>>> Antoine Bonavita (anto...@stickyads.tv
>>>> <mailto:anto...@stickyads.tv>) - CTO StickyADS.tv
>>>> Tel: +33 6 34 33 47 36 /+33 9 50
>>>> 68 21 32 
>>>> NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN | MADRID
>>>>
>>>>
>>>>
>>>
>>
> --
> Antoine Bonavita (anto...@stickyads.tv) - CTO StickyADS.tv
> Tel: +33 6 34 33 47 36/+33 9 50 68 21 32
> NEW YORK | LONDON | HAMBURG | PARIS | MONTPELLIER | MILAN | MADRID
>



-- 

Thanks,
Ryan Svihla

Re: Cassandra Object Mapper - Dynamically pass keyspace value

2015-10-25 Thread Ryan Svihla

You should probably ask the java driver user list
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user
,

However I do have some suggestions (any follow up questions please ask on
the java driver list though):

   1. It's optional so you don't have to pass it (

https://github.com/datastax/java-driver/blob/61a98e83cf35ae3e979d6073aeb40ba78eed11d5/driver-mapping/src/main/java/com/datastax/driver/mapping/annotations/UDT.java)
   and you can rely on the mapper knowing about the keyspace before you pass
   this (either via connect or a "USE command" on the session object (which is
   what I do in my project with the Table annotation).
   2. You can just rely on Java and delegate this to a static property
   which reads a configuration value from say a system property (which you can
   set in Maven using profiles for example).

On Tue, Oct 20, 2015 at 12:21 PM, Ashish Soni  wrote:

> Hi All ,
>
> is there any way i can specify value of keyspace during compile time like
> using maven build
> hard coding keyspace name inside the java class is bit not comfortable as
> if there a change and there are 1000's of files it become a big maintenance
> issue
>
> @UDT (keyspace = "complex", name = "address")public class Address {
> private String street;
> private String city;
> private int zipCode;
>
>

-- 

Thanks,
Ryan Svihla

Re: Is replication possible with already existing data?

2015-10-25 Thread Ryan Svihla

or query failed (no host was tried)
>>> at
>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>> at
>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>>> at
>>> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>>> at
>>> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>>> at
>>> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>>> at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>>> ... 3 more
>>>
>>> ###
>>>
>>>
>>> I have already tried ::
>>>
>>> 1)
>>> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>>>
>>> 2)
>>> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>>>
>>> 3)
>>> I have also confirmed that each of the 4 nodes are telnet-able over
>>> ports 9042 and 9160 each.
>>>
>>>
>>> Definitely seems to be some driver-issue, since
>>> data-persistence/replication works perfect (with any permutation) if
>>> data-persistence is done via "cqlsh".
>>>
>>>
>>> Kindly provide some pointers.
>>> Ultimately, it is the Java-driver that will be used in production, so it
>>> is imperative that data-persistence/replication happens for any downing of
>>> any permutation of node(s).
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 

Thanks,
Ryan Svihla

Re: How to read data from local cassandra cluster

2015-10-18 Thread Ryan Svihla

Sorry I forgot one /
cfs:///filename






On Sun, Oct 18, 2015 at 3:14 PM -0700, "Ryan Svihla"  wrote:










Not a Cassandra question so this isn't the right list, but you can just upload 
the file to CFS and then access it by the path "cfs://filename".
However, since you have DSE you may want to contact support for help with 
pathing in DSE using CFS and Spark.
-Ryan Svihla




On Fri, Oct 16, 2015 at 1:33 AM -0700, "Adamantios Corais" 
 wrote:











Hi,
I have install Cassandra locally (DataStax Enterprise to be specific). 
Everything seems to work ok. For example, I can upload a test file into CFS or 
open a Spark REPL.
However, when it comes to my very own Spark application, I can't understand how 
to modify sc.textFile("/user/testuser/words.txt") so that I can read the file I 
just uploaded to my local DataStax installation. 
How should I refer to the associated host?

// Adamantios

Re: How to read data from local cassandra cluster

2015-10-18 Thread Ryan Svihla

Not a Cassandra question so this isn't the right list, but you can just upload 
the file to CFS and then access it by the path "cfs://filename".
However, since you have DSE you may want to contact support for help with 
pathing in DSE using CFS and Spark.
-Ryan Svihla




On Fri, Oct 16, 2015 at 1:33 AM -0700, "Adamantios Corais" 
 wrote:











Hi,
I have install Cassandra locally (DataStax Enterprise to be specific). 
Everything seems to work ok. For example, I can upload a test file into CFS or 
open a Spark REPL.
However, when it comes to my very own Spark application, I can't understand how 
to modify sc.textFile("/user/testuser/words.txt") so that I can read the file I 
just uploaded to my local DataStax installation. 
How should I refer to the associated host?

// Adamantios

Re: Advice for asymmetric reporting cluster architecture

2015-10-18 Thread Ryan Svihla

Don't forget SSDs for indexing joy and a reasonable amount of cpu or those 
indexes will be very behind.
If you size the hardware correctly and avoid very silly configuration it works 
really well for this sort of purpose, especially when combined with Spark to do 
any hardcore analysis on the filtered dataset.

- Ryan Svihla




On Sat, Oct 17, 2015 at 7:12 PM -0700, "Jack Krupansky" 
 wrote:










Yes, you can have all your normal data centers with DSE configured for 
real-time data access and then have a data center that shares the same data but 
has DSE Search (Solr indexing) enabled. Your Cassandra data will get replicated 
to the Search data center and then indexed there and only there. You do need to 
have more RAM on the DSE Search nodes for the indexing, and maybe more nodes as 
well to assure decent latency for complex queries.
-- Jack Krupansky

On Sat, Oct 17, 2015 at 3:54 PM, Mark Lewis  wrote:


I hadn't considered it because I didn't think it could be configured just for a 
single data center; can it?
On Oct 17, 2015 8:50 AM, "Jack Krupansky"  wrote:
Did you consider DSE Search in a DC?
-- Jack Krupansky

On Sat, Oct 17, 2015 at 11:30 AM, Mark Lewis  wrote:
I've got an existing C* cluster spread across three data centers, and I'm 
wrestling with how to add some support for ad-hoc user reporting against 
(ideally) near real-time data.  
The type of reports I want to support basically boil down to allowing the user 
to select a single highly-denormalized "Table" from a predefined list, pick 
some filters (ideally with arbitrary boolean logic), project out some columns, 
and allow for some simple grouping and aggregation.  I've seen several 
companies expose reporting this way and it seems like a good way to avoid the 
complexity of joins while still providing a good deal of flexibility.
Has anybody done this or have any recommendations?
My current thinking is that I'd like to have the ad-hoc reporting 
infrastructure in separate data centers from our active production OLTP-type 
stuff, both to isolate any load away from the OLTP infrastructure and also 
because I'll likely need other stuff there (Spark?) to support ad-hoc reporting.
So I basically have two problems:(1) Get an eventually-consistent view of the 
data into a data-center I can query against relativly quickly (so no big batch 
imports)(2) Be able to run ad-hoc user queries against it
If I just think about query flexibility, I might consider dumping data into 
PostgreSQL nodes (practical because the data that any individual user can query 
will fit onto a single node).  But then I have the problem of getting the data 
there; I looked into an architecture using Kafka to pump data from the OLTP 
data centers to PostgreSQL mirrors, but down that road lies the need to 
manually deal with the eventual consistency.  Ugh.
If I just run C* nodes in my reporting cluster that makes the problem of 
getting the data into the right place with eventual consistency easy to solve 
and I like that idea quite a lot, but then I need to run reporting against C*.  
I could make the queries I need to run reasonably performant with enough 
secondary-indexes or materialized views (we're upgrading to 3.0 soon), but I 
would need a lot of secondary-indexes and materialized views, and I'd rather 
not pay to store them in all of my data centers.  I wish there were a way to 
define secondary-indexes or materialized views to only exist in one DC of a 
cluster, but unless I've missed something it doesn't look possible.
Any advice or case studies in this area would be greatly appreciated.
-- Mark

Re: Realtime data and (C)AP

2015-10-11 Thread Ryan Svihla

Downgrading Consistency Policy suffers from effectively being the
downgraded consistency policy aka CL one. I think it's helpful to remember
that Consistency Level is effectively a contract on your consistency, if
you do "quorum or one" you're basically CL ONE. Think of it this way, CL
ONE usually successfuly writes to RF nodes, but you're only requiring one
to have a successful write, how is that any different than "quorum or one"?
if you only have one node up it'll be CL ONE, if you have two nodes up
it'll be CL QUORUM.

This approach somehow accomplishes the worst of both worlds, with the speed
of QUORUM (since it has to fail to downgrade) and the consistency contract
of ONE, it really is a pretty terrible engineering tradeoff. Plus if you're
ok with ONE some of the time you're ok with ONE all the time.

For clarity, I think downgrading consistency policy should be deprecated, I
think it totally gets people thinking the wrong way about consistency level.

On Sun, Oct 11, 2015 at 11:48 AM, Eric Stevens  wrote:

> The DataStax Java driver is based on Netty and is non blocking; if you do
> any CQL work you should look into it.  At ProtectWise we use it with high
> write volumes from Scala/Akka with great success.
>
> We have a thin Scala wrapper around the Java driver that makes it act more
> Scalaish (eg, Scala futures instead of Java futures, string contexts to
> construct statements, and so on).  This has also let us do some other cool
> things like integrate Zipkin tracing at a driver level, and add other
> utility like token aware batches, and concurrent token aware batch selects.
>
> On Sat, Oct 10, 2015 at 2:49 PM Graham Sanderson  wrote:
>
>> Cool - yeah we are still on astyanax mode drivers and our own built from
>> scratch 100% non blocking Scala driver that we used in akka like
>> environments
>>
>> Sent from my iPhone
>>
>> On Oct 10, 2015, at 12:12 AM, Steve Robenalt 
>> wrote:
>>
>> Hi Graham,
>>
>> I've used the Java driver's DowngradingConsistencyRetryPolicy for that in
>> cases where it makes sense.
>>
>> Ref:
>> http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html
>>
>> Steve
>>
>>
>>
>> On Fri, Oct 9, 2015 at 6:06 PM, Graham Sanderson  wrote:
>>
>>> Actually maybe I'll open a JIRA issue for a (local)quorum_or_one
>>> consistency level... It should be trivial to implement on server side with
>>> exist timeouts ... I'll need to check the CQL protocol to see if there is a
>>> good place to indicate you didn't reach quorum (in time)
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 9, 2015, at 8:02 PM, Graham Sanderson  wrote:
>>>
>>> Most of our writes are not user facing so local_quorum is good... We
>>> also read at local_quorum because we prefer guaranteed consistency... But
>>> we very quickly fall back to local_one in the cases where some data fast is
>>> better than a failure. Currently we do that on a per read basis but we
>>> could I suppose detect a pattern or just look at the gossip to decide to go
>>> en masse into a degraded read mode
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 9, 2015, at 5:39 PM, Steve Robenalt 
>>> wrote:
>>>
>>> Hi Brice,
>>>
>>> I agree with your nit-picky comment, particularly with respect to the
>>> OP's emphasis, but there are many cases where read at ONE is sufficient and
>>> performance is "better enough" to justify the possibility of a wrong
>>> result. As with anything Cassandra, it's highly dependent on the nature of
>>> the workload.
>>>
>>> Steve
>>>
>>>
>>> On Fri, Oct 9, 2015 at 12:36 PM, Brice Dutheil 
>>> wrote:
>>>
>>>> On Fri, Oct 9, 2015 at 2:27 AM, Steve Robenalt 
>>>> wrote:
>>>>
>>>> In general, if you write at QUORUM and read at ONE (or LOCAL variants
>>>>> thereof if you have multiple data centers), your apps will work well
>>>>> despite the theoretical consistency issues.
>>>>
>>>> Nit-picky comment : if consistency is something important then reading
>>>> at QUORUM is important. If read is ONE then the read operation *may*
>>>> not see important update. The safest option is QUORUM for both write and
>>>> read. Then depending on the business or feature the consistency may be
>>>> tuned.
>>>>
>>>> — Brice
>>>> 
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Robenalt
>>> Software Architect
>>> sroben...@highwire.org 
>>> (office/cell): 916-505-1785
>>>
>>> HighWire Press, Inc.
>>> 425 Broadway St, Redwood City, CA 94063
>>> www.highwire.org
>>>
>>> Technology for Scholarly Communication
>>>
>>>
>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>> sroben...@highwire.org 
>> (office/cell): 916-505-1785
>>
>> HighWire Press, Inc.
>> 425 Broadway St, Redwood City, CA 94063
>> www.highwire.org
>>
>> Technology for Scholarly Communication
>>
>>


-- 

Thanks,
Ryan Svihla

Re: High read latency

2015-09-25 Thread Ryan Svihla

if everything is in ram there could be a number of issues unrelated to 
Cassandra and there could be hardware limitations or contention problems. 
Otherwise cell count can really deeply impact reads, all ram or not, and some 
of this is because of the nature of GC and some of it is the age of the sstable 
format (which is due to be revamped in 3.0). Also partition size can matter 
just because of physics, if one of those is a 1gb partition, the network 
interface can only move that back across the wire so quickly not to mention the 
GC issues you’d run into.

Anyway this is why I asked for the histograms, I wanted to get cell count and 
partition size. I’ve seen otherwise very stout hardware get slow on reads of 
large results because either a bottleneck was hit somewhere, or the CPU got 
slammed with GC, or other processes running on the machine were contending with 
Cassandra.

> On Sep 25, 2015, at 12:45 PM, Jaydeep Chovatia  
> wrote:
> 
> I understand that but everything is in RAM (my data dir is tmpfs) and my row 
> is not that wide approx. less than 5MB in size. So my question is if 
> everything is in RAM then why does it take 43ms latency? 
> 
> On Fri, Sep 25, 2015 at 7:54 AM, Ryan Svihla  <mailto:r...@foundev.pro>> wrote:
> if you run:
> 
> nodetool cfhistograms  
> 
> On the given table and that will tell you how wide your rows are getting. At 
> some point you can get wide enough rows that just the physics of retrieving 
> them all take some time. 
> 
> 
>> On Sep 25, 2015, at 9:21 AM, sai krishnam raju potturi > <mailto:pskraj...@gmail.com>> wrote:
>> 
>> Jaydeep; since your primary key involves a clustering column, you may be 
>> having pretty wide rows. The read would be sequential. The latency could be 
>> acceptable, if the read were to involve really wide rows.
>> 
>> If your primary key was like ((a,b)) without the clustering column, it's 
>> like reading a key value pair, and 40ms latency may have been a concern. 
>> 
>> Bottom Line : The latency depends on how wide the row is.
>> 
>> On Tue, Sep 22, 2015 at 1:27 PM, sai krishnam raju potturi 
>> mailto:pskraj...@gmail.com>> wrote:
>> thanks for the information. Posting the query too would be of help.
>> 
>> On Tue, Sep 22, 2015 at 11:56 AM, Jaydeep Chovatia 
>> mailto:chovatia.jayd...@gmail.com>> wrote:
>> Please find required details here:
>> 
>> -  Number of req/s
>> 
>> 2k reads/s
>> 
>> -  Schema details
>> 
>> create table test {
>> 
>> a timeuuid,
>> 
>> b bigint,
>> 
>> c int,
>> 
>> d int static,
>> 
>> e int static,
>> 
>> f int static,
>> 
>> g int static,
>> 
>> h int,
>> 
>> i text,
>> 
>> j text,
>> 
>> k text,
>> 
>> l text,
>> 
>> m set
>> 
>> n bigint
>> 
>> o bigint
>> 
>> p bigint
>> 
>> q bigint
>> 
>> r int
>> 
>> s text
>> 
>> t bigint
>> 
>> u text
>> 
>> v text
>> 
>> w text
>> 
>> x bigint
>> 
>> y bigint
>> 
>> z bigint,
>> 
>> primary key ((a, b), c)
>> 
>> };
>> 
>> -  JVM settings about the heap
>> 
>> Default settings
>> 
>> -  Execution time of the GC
>> 
>> Avg. 400ms. I do not see long pauses of GC anywhere in the log file.
>> 
>> 
>> On Tue, Sep 22, 2015 at 5:34 AM, Leleu Eric > <mailto:eric.le...@worldline.com>> wrote:
>> Hi,
>> 
>>  
>> 
>>  
>> 
>> Before speaking about tuning, can you provide some additional information ?
>> 
>>  
>> 
>> -  Number of req/s
>> 
>> -  Schema details
>> 
>> -  JVM settings about the heap
>> 
>> -  Execution time of the GC
>> 
>>  
>> 
>> 43ms for a read latency may be acceptable according to the number of request 
>> per second.
>> 
>>  
>> 
>>  
>> 
>> Eric
>> 
>>  
>> 
>> De : Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com 
>> <mailto:chovatia.jayd...@gmail.com>] 
>> Envoyé : mardi 22 septembre 2015 00:07
>> À : user@cassandra.apache.org <mailto:user@cassandra.apache.org>
>> Objet : High read latency
>> 
>>  
>> 
>> Hi,
>> 
>>  
>> 
>> My application issues more read requests than write, I do see that under 
>> load cfstats for one of the table is quite high around 43ms
>

Re: To batch or not to batch: A question for fast inserts

2015-09-25 Thread Ryan Svihla


I think my main point is still, unlogged token aware batches are great, but if 
you’re writes are large enough, they may actually hurt rather than help, and 
likewise if your writes are too small, async only is likely only going to hurt. 
I’d say the average user I’ve had to help (with my selection bias) has 
individual writes already on the large size of optimal so batching frequently 
hurts them. Also they tend not to do async in the first place.

In summary, batch or not is IMO the wrong area to focus, total write payload 
sizing for your cluster is the factor to focus on and however you get there is 
fantastic. more replies inline:

> On Sep 25, 2015, at 1:24 PM, Eric Stevens  wrote:
> 
> > compaction usually is the limiter for most clusters, so the difference 
> > between async versus unlogged batch ends up being minor or worse..non 
> > existent cause the hardware and data model combination result in compaction 
> > being the main throttle.
> 
> If your number of records to load per second is predetermined (as would be 
> the case in any production use case), then this doesn't make any difference 
> on compaction whether loaded as batches vs as single statements, your cluster 
> needs to support the same number and shape of mutates either way.

Not everyone is as grown up about their cluster sizing. Lots of folks are still 
stuck on maximum utilization, ironically these same people tend to focus on 
using spindles for storage and so will ultimately end up having to throttle 
ingest to allow compaction to catch up. Anyway in these admittedly awful 
situations throttling of ingest is all too common as the commit log can 
basically easily outstrip compaction. 

> 
> > if you add in token awareness to your batch..you’ve basically eliminated 
> > the primary complaint of using unlogged batches so why not do that. 
> 
> This is absolutely the right idea if your driver supports it, but the gain is 
> smaller than I would have expected based on the warnings of imminent doom 
> when we've had this conversation before.  If your driver supports token 
> awareness, use that to group statements by primary replica and concurrently 
> execute those that way.  Here's the code we're using (in Scala using the Java 
> driver):
> def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] 
> = {
>   val meta = session.getCluster.getMetadata
>   statements.groupBy { st =>
> try {
>   meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
> } catch { case NonFatal(e) =>
>   null
> }
>   } mapValues { st => CQLBatch(st) }
> }
> We now have a map of primary host to sub-batch for all the statements in our 
> logical batch.  We can now do either of these (depending on how greedy we 
> want to be in our client; Future.traverse is preferred and nicer, 
> Future.sequence is greedier and more resource intensive):
> Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
> Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)
> We get back Future[Iterable[ResultSet]] - this future completes when the 
> logical batch's sub-batches have all completed.
> 
> Note that with the DSE Java driver, for the above to succeed in its intent, 
> the statements need to be prepared statements (for st.getRoutingKey to return 
> non-null), and either the keyspace has to be fully defined in the CQL, or you 
> have to have set the correct keyspace when you created the connection (for 
> st.getKeyspace to return non-null).  Otherwise the values given to 
> meta.getReplicas will fail to resolve a primary host which results in doing 
> token-unaware batches (i.e. you'll get back a Map(null -> allStatements)).  
> However those same criteria are required for single statements to be token 
> aware.
> 

This is excellent stuff, my only concern with primary replicas is for people 
with uneven partitions, and the occasionally stupidly fat one. I’d rather 
spread those writes around the other replicas instead of beating up the primary 
one. However, for a well modeled partition key the approach you outline is 
probably optimal.

> 
> 
> 
> On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla  <mailto:r...@foundev.pro>> wrote:
> Generally this is all correct but I cannot emphasize enough how much this 
> “just depends” and today I generally move people to async inserts first 
> before trying to micro-optimize some things to keep in mind.
> 
> compaction usually is the limiter for most clusters, so the difference 
> between async versus unlogged batch ends up being minor or worse..non 
> existent cause the hardware and data model combination result in compaction 
> being the main throttle.
> if you add in token awar

Re: memory usage problem of Metadata.tokenMap.tokenToHost

2015-09-25 Thread Ryan Svihla

In practice there are not many good reasons to use that many keyspaces and 
tables. If the use case is multi tenancy then you’re almost always better off 
just using a combination of version tables and tenantId to give you flexibility 
as well as separation of client data. If you have that many data types then it 
maybe worth considering a blog or text value for the data and then a data type 
column so you can serialize and deserialize on the client. 




> On Sep 22, 2015, at 3:09 AM, horschi  wrote:
> 
> Hi Joseph,
> 
> I think 2000 keyspaces might be just too much. Fewer keyspaces (and CFs) will 
> probably work much better.
> 
> kind regards,
> Christian
> 
> 
> On Tue, Sep 22, 2015 at 9:29 AM, joseph gao  > wrote:
> Hi, anybody could help me?
> 
> 2015-09-21 0:47 GMT+08:00 joseph gao  >:
> ps : that's the code in java drive , in MetaData.TokenMap.build:
> for (KeyspaceMetadata keyspace : keyspaces)
> {
> ReplicationStrategy strategy = keyspace.replicationStrategy();
> Map> ksTokens = (strategy == null)
> ? makeNonReplicatedMap(tokenToPrimary)
> : strategy.computeTokenToReplicaMap(tokenToPrimary, ring);
> 
> tokenToHosts.put(keyspace.getName(), ksTokens);
> tokenToPrimary is all same, ring is all same, and if strategy is all same , 
> strategy.computeTokenToReplicaMap would return 'same' map but different 
> object( cause every calling returns a new HashMap
> 
> 2015-09-21 0:22 GMT+08:00 joseph gao  >:
> cassandra: 2.1.7
> java driver: datastax java driver 2.1.6
> 
> Here is the problem:
>My application uses 2000+ keyspaces, and will dynamically create keyspaces 
> and tables. And then in java client, the Metadata.tokenMap.tokenToHost would 
> use about 1g memory. so this will cause a lot of  full gc.
>As I see, the key of the tokenToHost is keyspace, and the value is a 
> tokenId_to_replicateNodes map.
> 
>When I try to solve this problem, I find something not sure: all keyspaces 
> have same 'tokenId_to_replicateNodes' map.
> My replication strategy of all keyspaces is : simpleStrategy and 
> replicationFactor is 3
> 
> So would it be possible if keyspaces use same strategy, the value of 
> tokenToHost map use a same map. So it would extremely reduce the memory usage
> 
>  thanks a lot
> 
> -- 
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
> 
> 
> 
> -- 
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
> 
> 
> 
> -- 
> --
> Joseph Gao
> PhoneNum:15210513582
> QQ: 409343351
>

Re: High read latency

2015-09-25 Thread Ryan Svihla

if you run:

nodetool cfhistograms  

On the given table and that will tell you how wide your rows are getting. At 
some point you can get wide enough rows that just the physics of retrieving 
them all take some time. 


> On Sep 25, 2015, at 9:21 AM, sai krishnam raju potturi  
> wrote:
> 
> Jaydeep; since your primary key involves a clustering column, you may be 
> having pretty wide rows. The read would be sequential. The latency could be 
> acceptable, if the read were to involve really wide rows.
> 
> If your primary key was like ((a,b)) without the clustering column, it's like 
> reading a key value pair, and 40ms latency may have been a concern. 
> 
> Bottom Line : The latency depends on how wide the row is.
> 
> On Tue, Sep 22, 2015 at 1:27 PM, sai krishnam raju potturi 
> mailto:pskraj...@gmail.com>> wrote:
> thanks for the information. Posting the query too would be of help.
> 
> On Tue, Sep 22, 2015 at 11:56 AM, Jaydeep Chovatia 
> mailto:chovatia.jayd...@gmail.com>> wrote:
> Please find required details here:
> 
> -  Number of req/s
> 
> 2k reads/s
> 
> -  Schema details
> 
> create table test {
> 
> a timeuuid,
> 
> b bigint,
> 
> c int,
> 
> d int static,
> 
> e int static,
> 
> f int static,
> 
> g int static,
> 
> h int,
> 
> i text,
> 
> j text,
> 
> k text,
> 
> l text,
> 
> m set
> 
> n bigint
> 
> o bigint
> 
> p bigint
> 
> q bigint
> 
> r int
> 
> s text
> 
> t bigint
> 
> u text
> 
> v text
> 
> w text
> 
> x bigint
> 
> y bigint
> 
> z bigint,
> 
> primary key ((a, b), c)
> 
> };
> 
> -  JVM settings about the heap
> 
> Default settings
> 
> -  Execution time of the GC
> 
> Avg. 400ms. I do not see long pauses of GC anywhere in the log file.
> 
> 
> On Tue, Sep 22, 2015 at 5:34 AM, Leleu Eric  <mailto:eric.le...@worldline.com>> wrote:
> Hi,
> 
>  
> 
>  
> 
> Before speaking about tuning, can you provide some additional information ?
> 
>  
> 
> -  Number of req/s
> 
> -  Schema details
> 
> -  JVM settings about the heap
> 
> -  Execution time of the GC
> 
>  
> 
> 43ms for a read latency may be acceptable according to the number of request 
> per second.
> 
>  
> 
>  
> 
> Eric
> 
>  
> 
> De : Jaydeep Chovatia [mailto:chovatia.jayd...@gmail.com 
> <mailto:chovatia.jayd...@gmail.com>] 
> Envoyé : mardi 22 septembre 2015 00:07
> À : user@cassandra.apache.org <mailto:user@cassandra.apache.org>
> Objet : High read latency
> 
>  
> 
> Hi,
> 
>  
> 
> My application issues more read requests than write, I do see that under load 
> cfstats for one of the table is quite high around 43ms
> 
>  
> 
> Local read count: 114479357
> 
> Local read latency: 43.442 ms
> 
> Local write count: 22288868
> 
> Local write latency: 0.609 ms
> 
>  
> 
>  
> 
> Here is my node configuration:
> 
> RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of data 
> on each node (and for experiment purpose I stored data in tmpfs)
> 
>  
> 
> I've tried increasing concurrent_read count upto 512 but no help in read 
> latency. CPU/Memory/IO looks fine on system.
> 
>  
> 
> Any idea what should I tune?
> 
>  
> 
> Jaydeep
> 
> 
> 
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
> exclusif de ses destinataires. Il peut également être protégé par le secret 
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne 
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra 
> être recherchée quant au contenu de ce message. Bien que les meilleurs 
> efforts soient faits pour maintenir cette transmission exempte de tout virus, 
> l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
> saurait être recherchée pour tout dommage résultant d'un virus transmis.
> 
> This e-mail and the documents attached are confidential and intended solely 
> for the addressee; it may also be privileged. If you receive this e-mail in 
> error, please notify the sender immediately and destroy it. As its integrity 
> cannot be secured on the Internet, the Worldline liability cannot be 
> triggered for the message content. Although the sender endeavours to maintain 
> a computer virus-free network, the sender does not warrant that this 
> transmission is virus-free and will not be liable for any damages resulting 
> from any virus transmitted.
> 
> 
> 

Regards,

Ryan Svihla

Re: Seeing null pointer exception 2.0.14 after purging gossip state

2015-09-25 Thread Ryan Svihla

could it be related to CASSANDRA-9180 
<https://issues.apache.org/jira/browse/CASSANDRA-9180> which was fixed in 
2.0.15? although it really behaves like CASSANDRA-10231 
<https://issues.apache.org/jira/browse/CASSANDRA-10231> which I don’t see any 
reference to it being in 2.0.x

> On Sep 24, 2015, at 12:57 PM, Robert Coli  wrote:
> 
> On Mon, Sep 14, 2015 at 7:53 PM, K F  <mailto:kf200...@yahoo.com>> wrote:
> I have cassandra 2.0.14 deployed and after following the method described in 
> Apache Cassandra™ 2.0 
> <http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html>
>  to clear the gossip state of the node in one of the dc of my cluster
> 
> Why did you need to do this?
> 
>  I see wierd exception on the nodes not many but a few in an hour for nodes 
> that have already successfully decommissioned from the cluster, you can see 
> from below exception that 10.0.0.1 has been already decommissioned. Below is 
> the exception snippet. 
> 
> Have you done :
> 
> nodetool gossipinfo |grep SCHEMA |sort | uniq -c | sort -n
> 
> and checked for schema agreement... ?
> 
> =Rob
>  

Regards,

Ryan Svihla

Re: To batch or not to batch: A question for fast inserts

2015-09-25 Thread Ryan Svihla

Generally this is all correct but I cannot emphasize enough how much this “just 
depends” and today I generally move people to async inserts first before trying 
to micro-optimize some things to keep in mind.

compaction usually is the limiter for most clusters, so the difference between 
async versus unlogged batch ends up being minor or worse..non existent cause 
the hardware and data model combination result in compaction being the main 
throttle.
if you add in token awareness to your batch..you’ve basically eliminated the 
primary complaint of using unlogged batches so why not do that. When I was at 
DataStax I made some similar suggestions for token aware batch after seeing the 
perf improvements with Spark writes using unlogged batch. Several others did as 
well so I’m not the first one with this idea.
write size makes in my experience the largest difference BY FAR about which is 
faster. and the number is largely irrelevant compared to the total payload 
size. Depending on the hardware and etc a good rule of thumb is writes below 1k 
bytes tend to get really inefficient and writes that are over 100k tend to slow 
down total throughput. I’ll reemphasize this magic number has been different on 
almost every cluster I’ve tuned.

In summary all this means is, too small or too large of writes are slow, and 
unlogged batches may involve some extra hops, if you eliminate the extra hops 
by token awareness then it just comes down to write size optimization.

> On Sep 24, 2015, at 5:18 PM, Eric Stevens  wrote:
> 
> > I side-tracked some punctual benchmarks and stumbled on the observations of 
> > unlogged inserts being *A LOT* faster than the async counterparts.
> 
> My own testing agrees very strongly with this.  When this topic came up on 
> this list before, there was a concern that batch coordination produces GC 
> pressure in your cluster because you're involving nodes which aren't strictly 
> speaking necessary to be involved.  
> 
> Our own testing shows some small impact on this front, but really lightweight 
> GC tuning mitigated the effects by putting a little more room in Xmn (if 
> you're still on CMS garbage collector).  On G1GC (which is what we run in 
> production) we weren't able to measure a difference. 
> 
> Our testing shows data loads being as much as 5x to 8x faster when using 
> small concurrent batches over using single statements concurrently.  We tried 
> three different concurrency models.
> 
> To save on coordinator overhead, we group the statements in our "batch" by 
> replica (using the functionality exposed by the DataStax Java driver), and do 
> essentially token aware batching.  This still has a small amount of 
> additional coordinator overhead (since the data size of the unit of work is 
> larger, and sits in memory in the coordinator longer).  We've been running 
> this way successfully for months with sustained rates north of 50,000 mutates 
> per second.  We burst much higher.
> 
> Through trial and error we determined we got diminishing returns in the realm 
> of 100 statements per token-aware batch.  It looks like your own data bears 
> that out as well.  I'm sure that's workload dependent though.
> 
> I've been disagreed with on this topic in this list in the past despite the 
> numbers I was able to post.  Nobody has shown me numbers (nor anything else 
> concrete) that contradict my position though, so I stand by it.  There's no 
> question in my mind, if your mutates are of any significant volume and you 
> care about the performance of them, token aware unlogged batching is the 
> right strategy.  When we reduce our batch sizes or switch to single async 
> statements, we fall over immediately.  
> 
> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas  <mailto:gerard.m...@gmail.com>> wrote:
> General advice advocates for individual async inserts as the fastest way to 
> insert data into Cassandra. Our insertion mechanism is based on that model 
> and recently we have been evaluating performance, looking to measure and 
> optimize our ingestion rate.
> 
> I side-tracked some punctual benchmarks and stumbled on the observations of 
> unlogged inserts being *A LOT* faster than the async counterparts.
> 
> In our tests, unlogged batch shows increased throughput and lower cluster CPU 
> usage, so I'm wondering where the tradeoff might be.
> 
> I compiled those observations in this document that I'm sharing and opening 
> up for comments.  Are we observing some artifact or should we set the record 
> straight for unlogged batches to achieve better insertion throughput?
> 
> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI
>  
> <https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI>
> 
> Let me know.
> 
> Kind regards, 
> 
> Gerard.
> 

Regards,

Ryan Svihla

Re: How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-25 Thread Ryan Svihla

Why aren’t you using saveToCassandra 
(https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md
 
<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md>)?
 They have a number of locality aware optimizations that will probably exceed 
your by hand bulk loading (especially if you’re not doing it inside something 
like foreach partition).

Also you can easily tune up and down the size of those tasks and therefore 
batches to minimize harm on the prod system.

> On Sep 24, 2015, at 5:37 PM, Benyi Wang  wrote:
> 
> I use Spark and spark-cassandra-connector with a customized Cassandra writer 
> (spark-cassandra-connector doesn’t support DELETE). Basically the writer 
> works as follows:
> 
> Bind a row in Spark RDD with either INSERT/Delete PreparedStatement
> Create a BatchStatement for multiple rows
> Write to Cassandra.
> I knew using CQLBulkOutputFormat would be better, but it doesn't supports 
> DELETE. 
> 
> On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas  <mailto:gerard.m...@gmail.com>> wrote:
> How are you loading the data? I mean, what insert method are you using?
> 
> On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang  <mailto:bewang.t...@gmail.com>> wrote:
> I have a cassandra cluster provides data to a web service. And there is a 
> daily batch load writing data into the cluster.
> 
> Without the batch loading, the service’s Latency 99thPercentile is 3ms. But 
> during the load, it jumps to 90ms.
> I checked cassandra keyspace’s ReadLatency.99thPercentile, which jumps to 1ms 
> from 600 microsec.
> The service’s cassandra java driver request 99thPercentile was 90ms during 
> the load
> The java driver took the most time. I knew the Cassandra servers are busy in 
> writing, but I want to know what kinds of metrics can identify where is the 
> bottleneck so that I can tune it.
> 
> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
> 
> 
> 

Regards,

Ryan Svihla

Re: Querying on multiple columns

2015-09-07 Thread Ryan Svihla

That is the state of data modeling with 2.1 and it's worked quite well for
a lot of people (especially for those using batches or streaming to
maintain the different views of the same data).

However, you should be interested in the upcoming Materialized Views in 3.0
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views



On Thu, Sep 3, 2015 at 1:44 PM, Samya Maiti 
wrote:

> Hi All,
>
> I have a use-case where in I want to execute query on my cassandra table
> with different where clauses.
>
> Two approaches know to me is :-
>
>- Creating secondary index
>   - But to my understanding, query on secondary index will be slow.
>- Creating multiple tables with same data but different primary key.
>   - This option has many consequences as lot of things needs to be
>   taken care of while writing the data.
>
>
> Please let me know if a better solution is available. I am using 2.1.5
> version.
>
> Regards,
> Sam
>



-- 

Thanks,
Ryan Svihla

Re: How to prevent queries being routed to new DC?

2015-09-07 Thread Ryan Svihla

What's your keyspace replication strategy?

On Thu, Sep 3, 2015 at 3:16 PM Tom van den Berge 
wrote:

> Thanks for your help so far!
>
> I have some problems trying to understand the jira mentioned by Rob :(
>
> I'm currently trying to set up the first node in the new DC with
> auto_bootstrap = true. The node then becomes visible with status "joining",
> which (hopefully) prevents other DCs from sending queries to it.
>
> Do you think this will work?
>
>
>
> On Thu, Sep 3, 2015 at 9:46 PM, Robert Coli  wrote:
>
>> On Thu, Sep 3, 2015 at 12:25 PM, Bryan Cheng 
>> wrote:
>>
>>> I'd recommend you enable tracing and do a few queries in a controlled
>>> environment to verify that queries are being routed to your new nodes.
>>> Provided you have followed the procedure outlined above (specifically, have
>>> set auto_bootstrap to false in your new cluster), rebuild has not been run,
>>> the application is not connecting to the new cluster, and all your queries
>>> are run at LOCAL_* quorum levels, I do not believe those queries should be
>>> routed to the new dc.
>>>
>>
>> Other than CASSANDRA-9753, this is true.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-9753 (Unresolved; ):
>> "LOCAL_QUORUM reads can block cross-DC if there is a digest mismatch"
>>
>> =Rob
>>
>>
> --
Regards,

Ryan Svihla

Re: Data Size on each node

2015-09-07 Thread Ryan Svihla

Huge differences in ability to handle compaction and read contention. I've
taken spindle servers struggling at 7k tps for the cluster with 9 node data
centers (stupidly big writes, not my app) to doing that per node just by
swapping out to SSD. This says nothing about the 100x change in latency on
p99 queries.

Never seen a case yet where it wasn't several x more tolerant of data
density and a couple of order of magnitude faster on latency.

On Fri, Sep 4, 2015 at 3:38 AM Alprema  wrote:

> Hi,
>
> I agree with Alain, we have the same kind of problem here (4 DCs, ~1TB /
> node) and we are replacing our big servers full of spinning drives with a
> bigger number of smaller servers with SSDs (microservers are quite
> efficient in terms of rack space and cost).
>
> Kévin
>
> On Tue, Sep 1, 2015 at 1:11 PM, Alain RODRIGUEZ 
> wrote:
>
>> Hi,
>>
>> Our migration to SSD (from m1.xl to I2.2xl on AWS) has been a big win. I
>> mean we wen from 80 / 90 % disk utilisation to 20 % max. Basically,
>> bottleneck are not disks performances anymore in our case. We get rid of
>> one of our major issue that was disk contention.
>>
>> I highly recommend you to go ahead with this, even more with such a big
>> data set. Yet it will probably be more expensive per node.
>>
>> An other solution for you might be adding nodes (to have less to handle
>> per node and make maintenance operations like repair, bootstrap,
>> decommission, ... faster)
>>
>> C*heers,
>>
>> Alain
>>
>>
>>
>>
>> 2015-09-01 10:17 GMT+02:00 Sachin Nikam :
>>
>>> We currently have a Cassandra Cluster spread over 2 DC. The data size on
>>> each node of the cluster is 1.2TB with spinning disk. Minor and Major
>>> compactions are slowing down our Read queries. It has been suggested that
>>> replacing Spinning disks with SSD might help. Has anybody done something
>>> similar? If so what has been the results?
>>> Also if we go with SSD, how big can each node get for commercially
>>> available SSDs?
>>> Regards
>>> Sachin
>>>
>>
>>
> --
Regards,

Ryan Svihla

Re: cassandra scalability

2015-09-07 Thread Ryan Svihla

If that's what tracing is telling you then it's fine and just a product of
data distribution (note your token count isn't identical anyway).

If you're doing cl one queries directly against particular nodes and
getting different results it sounds like dropped mutations, streaming
errors and or timeouts. Does running repair or reading at CL level all give
you an accurate total record count?

nodetool tpstats should help post bootstrap identify dropped mutations but
you also want to monitor logs for any errors (in general this is always
good advice for any system).. There could be a myriad or problems with
bootstrapping new nodes, usually this is related to under provisioning.

On Mon, Sep 7, 2015 at 8:19 AM Alain RODRIGUEZ  wrote:

> Hi Sara,
>
> Can you detail actions performed, like how you load data, what scaleup /
> scaledown are and precise if you let it decommission fully (streams
> finished, node removed from nodetool status) etc ?
>
> This would help us to help you :).
>
> Also, what happens if you query using "CONSISTENCY LOCAL_QUORUM;" (or ALL)
> before your select ? If not using cqlsh, set the Consistency Level of your
> client to LOCAL_QUORUM or ALL and try to select again.
>
> Also, I am not sure of the meaning of this --> " i'm affecting to each of
> my node a different token based on there ip address (the token is A+B+C+D
> and the ip is A.B.C.D)". Aren't you using RandomPartitioner or
> Murmur3Partitioner ?
>
> C*heers,
>
> Alain
>
>
>
> 2015-09-07 12:01 GMT+02:00 Edouard COLE :
>
>> Please, don't mail me directly
>>
>> I read your answer, but I cannot help anymore
>>
>> And answering with "Sorry, I can't help" is pointless :)
>>
>> Wait for the community to answer
>>
>> De : ICHIBA Sara [mailto:ichi.s...@gmail.com]
>> Envoyé : Monday, September 07, 2015 11:34 AM
>> À : user@cassandra.apache.org
>> Objet : Re: cassandra scalability
>>
>> when there's a scaledown action, i make sure to decommission the node
>> before. but still, I don't understand why I'm having this behaviour. is it
>> normal. what do you do normally to remove a node? is it related to tokens?
>> i'm affecting to each of my node a different token based on there ip
>> address (the token is A+B+C+D and the ip is A.B.C.D)
>>
>> 2015-09-07 11:28 GMT+02:00 ICHIBA Sara :
>> at the biginning it looks like this:
>>
>> [root@demo-server-seed-k6g62qr57nok ~]# nodetool status
>> Datacenter: DC1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address Load   Tokens  OwnsHost
>> ID   Rack
>> UN  40.0.0.208  128.73 KB  248 ?
>> 6e7788f9-56bf-4314-a23a-3bf1642d0606  RAC1
>> UN  40.0.0.209  114.59 KB  249 ?
>> 84f6f0be-6633-4c36-b341-b968ff91a58f  RAC1
>> UN  40.0.0.205  129.53 KB  245 ?
>> aa233dc2-a8ae-4c00-af74-0a119825237f  RAC1
>>
>>
>>
>>
>> [root@demo-server-seed-k6g62qr57nok ~]# nodetool status
>> service_dictionary
>> Datacenter: DC1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address Load   Tokens  Owns (effective)  Host
>> ID   Rack
>> UN  40.0.0.208  128.73 KB  248 68.8%
>> 6e7788f9-56bf-4314-a23a-3bf1642d0606  RAC1
>> UN  40.0.0.209  114.59 KB  249 67.8%
>> 84f6f0be-6633-4c36-b341-b968ff91a58f  RAC1
>> UN  40.0.0.205  129.53 KB  245 63.5%
>> aa233dc2-a8ae-4c00-af74-0a119825237f  RAC1
>>
>> the result of the query select * from service_dictionary.table1; gave me
>>  70 rows from 40.0.0.205
>> 64 from 40.0.0.209
>> 54 from 40.0.0.208
>>
>> 2015-09-07 11:13 GMT+02:00 Edouard COLE :
>> Could you provide the result of :
>> - nodetool status
>> - nodetool status YOURKEYSPACE
>>
>>
>>
> --
Regards,

Ryan Svihla

Re: Convert joins in RDBMS to Cassandra

2015-09-07 Thread Ryan Svihla

The normal approach is denormalization to a materialized view (the
traditional thinking of it not the new 3.0 feature coming out), which is
also true of using an RBMS at scale (joins across all data sets get
expensive once you start having to shard across different servers).

The simplistic idea is to build your tables to map your queries (or API
calls if you want to be more advanced in your thinking) 1 for 1 with the
following sets of constraints:

   1. All data needed to satisfy that query exists in this table even if it
   also exists somewhere else.
   2. The partition key (first part of the primary key) matching the where
   clause you'd like to use on this table for that query.
   3. The clustering column defines order inside a given partition key.
   4. Partitions should not be "too fat". This is a more advanced topic.

Imagine the case of a user profile, I may choose to store all changes of
that user profile in a "profile history" table. It would probably look like

CREATE TABLE user_profile_history ( user_id uuid, ts timestamp, change
text, PRIMARY KEY( user_id, ts))

In this case I'll get a partition key of user_id, a timestamp of when the
change occurred as the clustering column giving me an implied order of
ascending, and the change in a text field.

Practical considerations for updating and keeping these tables in sync are
myriad. Starting out it's probably easiest to have one table be the "source
of truth" and all other views derived off that single source of truth,
either at write time, or with batches running throughout the day, or if
latency is a concern streaming (you can do streaming and batching for a
particularly potent combination).  This pattern I see frequently and have
pushed it's use to good effect many times.

The following blog provides some great introductory ideas on data modeling
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

On Fri, Sep 4, 2015 at 4:15 PM, srinivas s  wrote:

>
>1. Hi All,
>
>
>
>1. I am trying to model  RDBMS joins into cassandra. As I am new to
>cassandra, I need your help/suggestion on this.  Below is the information
>regarding the query:
>2.
>3. I have a query in RDBMS as follows:
>4.
>5. select t3.name from  Table1 t1, Table2 t2, Table3 t3, Table4 t4
>where
>6. t2.cust_id = 3 and t4.sid = t1.sid and t1.colid = t2.colid  and
>t4.cid = t3.cid
>7.
>8.
>9. Now, trying to make a shimilar query in cassandra:
>10.
>11. As per my learning experience in Cassandra, I got the below 2
>solutions: (as cassandra does not support joins)
>12.
>13. Solution 1:*
>14.
>15. 1) Fetch all the records with t2.cust_id = 3
>16. 2) Now again run another query that will do the condition t3.sid =
>t1.sid on the results returned from point 1.
>17. 3) continue the same for all the conditions.
>18.
>19. Drawbacks with this approach:
>20.
>21. For each join, I have to do a network call to fetch the details.
>Also, it will take more time..as I am running multiple conditions
>22.
>23.
>24. Solution 2: *
>25.
>26. 1) Create a map table for every possible join.
>27.
>28. Drawbacks with this aproach:
>29.
>30. I think, this is not a right approach. So join to table (map
>table) mapping idea is not right.
>31.
>32. pastebin link for the same: http://pastebin.com/FRAyihPT
>33. Please suggest me on this.
>
>
>
>

-- 

Thanks,
Ryan Svihla

Re: Is Cassandra really Strong consistency?

2015-09-07 Thread Ryan Svihla

The condition you bring up is a misconfigured cluster period, and no matter
how you look at it, that's the case. In other words, the scenario you're
bringing up does not get to the heart of the matter of Cassandra having
"Strong Consistency" or not, your example I'm sorry to say fails in this
regard.

However, lets get at what I believe you're attempting to talk about in
reality IE race condition protection when you desire a set order, this by
definition is the type of guarantee provided by linearizability. So without
SERIAL or LOCAL_SERIAL consistency when using a data model that depends on
_order_ (which your example does) you're going to be unhappy, ALL or ONE
consistency levels do nothing to address your example with or without clock
skew.

In theory the last timestamp of a given table could probably be satisfied
well enough for most problem domains by just keeping the servers pointing
to the same ntp server, in practice this is a very rare valid use case as
clusters doing several hundred thousand transactions per second (not
uncommon) would find that "last timestamp" is hopelessly wrong every time
to at best be an approximation, no matter the database technology.

On Mon, Sep 7, 2015 at 6:20 AM, ibrahim El-sanosi 
wrote:

> ""It you need strong consistency and don't mind lower transaction rate,
> you're better off with base""
> I wish you can explain more how this statment relate to the my post?
> Regards,
>

-- 

Thanks,
Ryan Svihla

Re: Does nodetool repair stop the node to answer requests ?

2015-01-23 Thread Ryan Svihla

For the cases where repair brings down a cluster, I can safely say that
cluster has more problems than just repair. Consider the load caused by
repair to be somewhat equivalent to losing a node. This is not a 1 for 1
comparison, as the node you're running repair on is up albeit busy,and the
potential streaming impacts can be of course worse, however, the entire
design of Cassandra is around being able to tolerate 1 or more node outages
(depending on your level of sizing) and not have your application be
affected.

On Thu, Jan 22, 2015 at 1:15 PM, SEGALIS Morgan  wrote:

> Don't think it is near failure, it uses only 3% of the CPU and 40% of the
> RAM if that is what you meant.
>
> 2015-01-22 19:58 GMT+01:00 Robert Coli :
>
>> On Thu, Jan 22, 2015 at 10:53 AM, SEGALIS Morgan 
>> wrote:
>>
>>> what do you mean by "operating correctly" ?
>>>
>>
>> I mean that if you are operating near failure, repair might trip a node
>> into failure. But if you are operating correctly, repair should not.
>>
>> =Rob
>>
>>
>
>
>
> --
> Morgan SEGALIS
>

-- 

Thanks,
Ryan Svihla

Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-22 Thread Ryan Svihla

Usually this is about tuning, and this isn't an uncommon situation for new
users.

Potential steps to take

1) reduce stream throughput to a point that your cluster can handle it.
This is probably your most important tool. The default throughput depending
on version is 200mb or 400mb, go ahead and drop it down further and
further, I've had to use as low as 15 megs on all nodes to get a single
node bootstrapped. Use nodetool for runtime change of this configuration
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsSetStreamThroughput.html

2) Scale up. if you run out of disk space on nodes and can't compact
anymore then add more disk and change where the data is stored ( make sure
your new disk is fast enough to keep up). If it's load add more cpu and ram.
3) Do some root cause analysis. I can't tell you how many of these issues
are bad JVM tuning, or bad cassandra settings.

On Thu, Jan 22, 2015 at 7:50 AM, Kai Wang  wrote:

> In last year's summit there was a presentation from Instaclustr -
> https://www.instaclustr.com/meetups/presentation-by-ben-bromhead-at-cassandra-summit-2014-san-francisco/.
> It could be the solution you are looking for. However I don't see the code
> being checked in or JIRA being created. So for now you'd better plan the
> capacity carefully.
>
>
> On Wed, Jan 21, 2015 at 11:21 PM, Yatong Zhang 
> wrote:
>
>> Yes, my cluster is almost full and there are lots of pending tasks. You
>> helped me a lot and thank you Eric~
>>
>> On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens  wrote:
>>
>>> Yes, bootstrapping a new node will cause read loads on your existing
>>> nodes - it is becoming the owner and replica of a whole new set of existing
>>> data.  To do that it needs to know what data it's now responsible for, and
>>> that's what bootstrapping is for.
>>>
>>> If you're at the point where bootstrapping a new node is placing a
>>> too-heavy burden on your existing nodes, you may be dangerously close to or
>>> even past the tipping point where you ought to have already grown your
>>> cluster.  You need to grow your cluster as soon as possible, and chances
>>> are you're close to no longer being able to keep up with compaction (see
>>> nodetool compactionstats, make sure pending tasks is <5, preferably 0 or
>>> 1).  Once you're falling behind on compaction, it becomes difficult to
>>> successfully bootstrap new nodes, and you're in a very tough spot.
>>>
>>>
>>> On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang 
>>> wrote:
>>>
>>>> Thanks for the reply. The bootstrap of new node put a heavy burden on
>>>> the whole cluster and I don't know why. So that' the issue I want to fix
>>>> actually.
>>>>
>>>> On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens 
>>>> wrote:
>>>>
>>>>> Yes, but it won't do what I suspect you're hoping for.  If you disable
>>>>> auto_bootstrap in cassandra.yaml the node will join the cluster and will
>>>>> not stream any old data from existing nodes.
>>>>>
>>>>> The cluster will now be in an inconsistent state.  If you bring enough
>>>>> nodes online this way to violate your read consistency level (eg RF=3,
>>>>> CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
>>>>> missing data that they ought to have returned.
>>>>>
>>>>> There is no way to bring a new node online and have it be responsible
>>>>> just for new data, and have no responsibility for old data.  It *will* be
>>>>> responsible for old data, it just won't *know* about the old data it
>>>>> should be responsible for.  Executing a repair will fix this, but only
>>>>> because the existing nodes will stream all the missing data to the new
>>>>> node.  This will create more pressure on your cluster than just normal
>>>>> bootstrapping would have.
>>>>>
>>>>> I can't think of any reason you'd want to do that unless you needed to
>>>>> grow your cluster really quickly, and were ok with corrupting your old 
>>>>> data.
>>>>>
>>>>> On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang 
>>>>> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I am using C* 2.0.10 and I was trying to add a new node to a
>>>>>> cluster(actually replace a dead node). But after added the new node some
>>>>>> other nodes in the cluster had a very high work-load and affected the 
>>>>>> whole
>>>>>> performance of the cluster.
>>>>>> So I am wondering is there a way to add a new node and this node only
>>>>>> afford new data?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 

Thanks,
Ryan Svihla

Re: C* throws OOM error despite use of automatic paging

2015-01-12 Thread Ryan Svihla

I think it's more accurate that to say that auto paging prevents one type
of OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What
kind of write throughput do you have going on at the same time? What errors
do you have in the cassandra logs before this crashes?


On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller 
wrote:

>  nodetool cfstats shows 9GB. We are storing simple primitive value. No
> blobs or collections.
>
>
>
> Mohammed
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Sent:* Friday, January 9, 2015 12:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: C* throws OOM error despite use of automatic paging
>
>
>
> What is the data size of the column family you're trying to fetch with
> paging ? Are you storing big blob or just primitive values ?
>
>
>
> On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller 
> wrote:
>
> Hi –
>
>
>
> We have an ETL application that reads all rows from Cassandra (2.1.2),
> filters them and stores a small subset in an RDBMS. Our application is
> using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since
> the Java driver supports automatic paging, I was under the impression that
> SELECT queries should not cause an OOM error on the C* nodes. However, even
> with just 16GB data on each nodes, the C* nodes start throwing OOM error as
> soon as the application starts iterating through the rows of a table.
>
>
>
> The application code looks something like this:
>
>
>
> Statement stmt = new SimpleStatement("SELECT x,y,z FROM
> cf").setFetchSize(5000);
>
> ResultSet rs = session.execute(stmt);
>
> while (!rs.isExhausted()){
>
>   row = rs.one()
>
>   process(row)
>
> }
>
>
>
> Even after we reduced the page size to 1000, the C* nodes still crash. C*
> is running on M3.xlarge machines (4-cores, 15GB). We manually increased the
> heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes,
> the heap usage climbs up to 7.6GB. That does not make sense. Either
> automatic paging is not working or we are missing something.
>
>
>
> Does anybody have insights as to what could be happening? Thanks.
>
>
>
> Mohammed
>
>
>
>
>
>
>



-- 

Thanks,
Ryan Svihla

Re: How to bulkload into a specific data center?

2015-01-08 Thread Ryan Svihla

Just noticed you'd sent this to the dev list, this is a question for only
the user list, and please do not send questions of this type to the
developer list.

On Thu, Jan 8, 2015 at 8:33 AM, Ryan Svihla  wrote:

> The nature of replication factor is such that writes will go wherever
> there is replication. If you're wanting responses to be faster, and not
> involve the REST data center in the spark job for response I suggest using
> a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the
> spark cassandra connector here
> https://github.com/datastax/spark-cassandra-connector ) . While write
> traffic will still be replicated to the REST service data center, because
> you do want those results available, you will not be waiting on the remote
> data center to respond "successful".
>
> Final point, bulk loading sends a copy per replica across the wire, so
> lets say you have RF3 in each data center that means bulk loading will send
> out 6 copies from that client at once, with normal mutations via thrift or
> cql writes between data centers go out as 1 copy, then that node will
> forward on to the other replicas. This means intra data center traffic in
> this case would be 3x more with the bulk loader than with using a
> traditional cql or thrift based client.
>
>
>
> On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang  wrote:
>
>> I set up two virtual data centers, one for analytics and one for REST
>> service. The analytics data center sits top on Hadoop cluster. I want to
>> bulk load my ETL results into the analytics data center so that the REST
>> service won't have the heavy load. I'm using CQLTableInputFormat in my
>> Spark Application, and I gave the nodes in analytics data center as
>> Intialial address.
>>
>> However, I found my jobs were connecting to the REST service data center.
>>
>> How can I specify the data center?
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


-- 

Thanks,
Ryan Svihla

Re: How to bulkload into a specific data center?

2015-01-08 Thread Ryan Svihla

The nature of replication factor is such that writes will go wherever there
is replication. If you're wanting responses to be faster, and not involve
the REST data center in the spark job for response I suggest using a cql
driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the spark
cassandra connector here
https://github.com/datastax/spark-cassandra-connector ) . While write
traffic will still be replicated to the REST service data center, because
you do want those results available, you will not be waiting on the remote
data center to respond "successful".

Final point, bulk loading sends a copy per replica across the wire, so lets
say you have RF3 in each data center that means bulk loading will send out
6 copies from that client at once, with normal mutations via thrift or cql
writes between data centers go out as 1 copy, then that node will forward
on to the other replicas. This means intra data center traffic in this case
would be 3x more with the bulk loader than with using a traditional cql or
thrift based client.

On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang  wrote:

> I set up two virtual data centers, one for analytics and one for REST
> service. The analytics data center sits top on Hadoop cluster. I want to
> bulk load my ETL results into the analytics data center so that the REST
> service won't have the heavy load. I'm using CQLTableInputFormat in my
> Spark Application, and I gave the nodes in analytics data center as
> Intialial address.
>
> However, I found my jobs were connecting to the REST service data center.
>
> How can I specify the data center?
>

-- 

Thanks,
Ryan Svihla

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Ryan Svihla

@Ken So I actually support a lot of the DSE Search users and teach classes
on it, so as long as you're not dropping mutations you're in sync, and if
you're dropping mutations you're probably sized way too small anyway, and
once you run repair (which you should be doing anyway when dropping
mutations) you're back in sync. I actually think because of that the models
work well together.

FWIW the improvement since 3.0 is MASSIVE (it's been what I'd call stable
since 3.2.x and we're on 4.6 now)

@Asit to answer the ES question, it's not really for me to say at all what
the lag will be or to help in advising sizing of ES, so that's probably
more of a question for them.


On Wed, Jan 7, 2015 at 8:56 AM, Asit KAUSHIK 
wrote:

> HI All,
>
> What i intend to do is on every write i would push the code to
> elasticsearch using the Trigger. I know it would impact the Cassandra write
> but  given that the WRITE is pretty performant on Cassandra would that lag
> be a big one.
>
> Also as per my information SOLR  has  limitation of using Nested JSON
> documents  which is elasticsearch does seamlessly and hence it was our
> preference.
>
> Please Let me know about you thought on this as we are struck on this and
> i am looking into Streaming Part of cassandra in hope that i can find
> something
>
> Regards
> Asit
>
>
>
> On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock 
> wrote:
>
>> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
>> same problem that you highlight, no different than your good idea of
>> asynchronously pushing to ES.
>>
>> Each Cassandra write was indexed independently by each server in the
>> replication group.  If a node timed out or a mutation was dropped, that
>> Solr node would have an out-of-sync index.  Doing a solr query such as
>> count(*) users could return inconsistent results depending on which node
>> you hit since solr didn't support Cassandra consistency levels.
>>
>> I haven't seen any blog posts or docs as to whether this intrinsic
>> mismatch between how Cassandra handles eventual consistency and Solr has
>> ever been resolved.
>>
>> Ken
>>
>>
>> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan  wrote:
>>
>>> Be very very careful not to perform blocking calls to ElasticSearch in
>>> your trigger otherwise you will kill C* performance. The biggest danger of
>>> the triggers in their current state is that they are on the write path.
>>>
>>> In your trigger, you can try to push the mutation asynchronously to ES
>>> but in this case it will mean managing a thread pool and all related issues.
>>>
>>> Not even mentioning atomicity issues like: what happen if the update to
>>> ES fails  or the connection times out ? etc ...
>>>
>>> As an alternative, instead of implementing yourself the integration with
>>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>>> with Apache Solr (not free) or some open-source alternatives like Stratio
>>> or TupleJump fork of Cassandra with Lucene integration.
>>>
>>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> > wrote:
>>>
>>>> HI All,
>>>>
>>>> We are trying to integrate elasticsearch with Cassandra and as the
>>>> river plugin uses select * from any table it seems to be bad performance
>>>> choice. So i was thinking of inserting into elasticsearch using Cassandra
>>>> trigger.
>>>> So i wanted your view does a Cassandra Trigger impacts the performance
>>>> of read/Write of Cassandra.
>>>>
>>>> Also any other way you guys achieve this please guide me. I am struck
>>>> on this .
>>>>
>>>> Regards
>>>> Asit
>>>>
>>>>
>>>
>>
>>
>>
>>
>


-- 

Thanks,
Ryan Svihla

Re:

2015-01-07 Thread Ryan Svihla

Something to start considering is the partition key (first part of your
primary key) drives your model more than anything. So if you're querying
for all of X your partition key should probably be X, but there are some
constraints to be mindful of.

The rest of replies inline

On Wed, Jan 7, 2015 at 1:37 AM, Nagesh  wrote:

> Thanks Ryan, Srinivas for you answer.
>
> Finally I have decided to create three column families
>
> 1. product_date_id (mm, dd, prodid) PRIMARY KEY ((mm), dd, prodid)
> - Record the arrival date on updates of a product
> - Get list of products that are recently added/updated Ex: [(mm, dd) >
> (2014, 06)]
>

Could just be product_date and include the entire product graph needed,
this is a tradeoff, and frequently it's optimal for performance reasons on
the read side, the downside is your usually increasing your write payload.
My thought is do a fully materialized view first and denormalize, and
include the entire product, and if you find the write traffic is too much
consider the index approach here then (which is easier after the fact to
just drop the columns).

> 1. product_status(prodid int, status int) PRIMARY KEY (prodid), INDEX on
> (status)
> - Each time I add a product just insert a record (prodid, defstatus) with
> the condition IF NOT EXISTS, to avoid status being updated, Here I couldnt
> avoid read before write to protect product status
>
> As for protecting product status that's fine, however, you could just do
what most applications do and update regardless of previous status. This
leads into different locking theories and what the right behavior for an
application is, but this is something most people never think twice about
when using MySQL or Oracle, and in the end they update status in
unprotected ways. Something to ponder.

- Update Enable/Disable prodid
> - Get list of product ids with the give status
>
>

List of product ids with a given status query will probably suck using a
2i, think scanning ALL of the nodes to get potentially as little as 2
records (if that fits within SLA however, kudos, just be aware of the
behavior).

Assuming you have large status counts and limited status items, the data
model gets trickier, as there are some rule of thumb style constraints
(varies on hardware and SLA what you can tolerate). Say you had a primary
key of (status, prodid), this would in theory very quickly return all of
the ACTIVE prodids as there may only be a few hundred, but lets say you
want to return all the archived prodids there maybe billions and this would
likely take far far too long to return in one query, not to mention
compaction of such a large partition will be fun, and it'll unbalance your
cluster.

So frequently for this particular corner I end up having to do some form of
sharding to spread status over the cluster and keep sizes of the partition
reasonable (and query in an async fashion to get all of the queries in a
reasonable time).

primary key((status, shardId), prodId)

The shardid can be any up to the reasonable size limits of your hardware
and cluster (say 50k for rule of thumb), and there are a number of
different approaches:

- it can be a random uuid but then you have to track with a separate table
what shardIds there are for that particular status (this is not uncommon)
- it can be a fixed size say 1 and you can just increment the number by
1 (but make sure as you're updating this you're not introducing any fun
state bugs that have to different shards writing to the same number). When
you query you keep increasing the number until you stop getting responses.
This has the downside in that optimization is a bit hard to get right.
Optionally you can have a static column in the table called maxShardId that
once you've done your first query you know how many parallel queries you
have to send out.
- It can be based on some business logic or domain rule that includes some
fixed boundaries, say add a productGroupId in there, and you know from an
application level, how many productGroupIds there are. This has the
downside of not giving you absolute protection against fat partitions, on
the upside it fits your natural domain model and is easier to reason about.

> 2. product_details(prodgrp, prodid, . )
> PRIMARY KEY (prodgrp, prodid)
> - Insert product details in the prodgrp blindly to store recent updates of
> the product details
> - Get list of products in the product group
> - Get details of products for the give ids
>
> "get list of products for a given range of ids" : My queries are answered
> with the above design.
>
> PS: I am still thinking to avoid read before write on product_status. And
> would like to see if there is better way to design using supercolumn
> families or materialized views which I am yet to explore.
>
>
Materialized views are your friend, use them freely

Re: Re: Is it possible to implement a interface to replace a row in cassandra using cassandra.thrift?

2015-01-07 Thread Ryan Svihla

really depends on your code for error handling, and since you're using
thrift it really depends on the client, if you're doing client side
timestamps then it's not related to time issues.

On Tue, Jan 6, 2015 at 8:19 PM,  wrote:

> Hi,
>
> I found that in my function, both delete and update  use the client side
> timestamp.
>
> The update timestamp should  be always bigger than the deletion timestamp.
>
>
> I wonder why the update failed in some cases?
>
>
> thank you.
>
>
> - 原始邮件 -
> 发件人：Ryan Svihla 
> 收件人：user@cassandra.apache.org, yhq...@sina.com
> 主题：Re: Is it possible to implement a interface to replace a row in
> cassandra using cassandra.thrift?
> 日期：2015年01月06日 23点34分
>
> replies inline
>
> On Tue, Jan 6, 2015 at 2:28 AM,  wrote:
>
> Hi, all:
>
> I use cassandra.thrift to implement a replace row interface in this
> way:
>
> First use batch_mutate to delete that row, then use batch_mutate to
> insert a new row.
>
> I always find that after call this interface, the row is not exist.
>
>
> Then I doubt that it is the problem caused by the deletion, because
> the deleteion has a timestamp set by the client.
>
> Maybe the time is not so sync between the client and cassandra server
> (1 or more seconds diff).
>
>
> It's a distributed database so time synchronization really really matters
> so use NTP, however if you're using client side timestamps on both the
> insert and the delete it's not going to matter for that use case
>
>
>
> How to solve this?? Is it possible to implement a  interface to
> replace a row in cassandra.???\
>
>
> yeah all updates are this way. Inserts are actually "UPSERTS" and you can
> go ahead and do two updates instead of insert, delete, update.
>
>
>
> Thanks.
>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


-- 

Thanks,
Ryan Svihla

Re: deletedAt and localDeletion

2015-01-06 Thread Ryan Svihla

If you look at the source there are some useful comments regarding those
specifics
https://github.com/apache/cassandra/blob/8d8fed52242c34b477d0384ba1d1ce3978efbbe8/src/java/org/apache/cassandra/db/DeletionTime.java

/** * A timestamp (typically in microseconds since the unix epoch, although
this is not enforced) after which * data should be considered deleted. If
set to Long.MIN_VALUE, this implies that the data has not been marked * for
deletion at all. */ public final long markedForDeleteAt; /** * The local
server timestamp, in seconds since the unix epoch, at which this tombstone
was created. This is * only used for purposes of purging the tombstone
after gc_grace_seconds have elapsed. */ public final int localDeletionTime;

On Mon, Jan 5, 2015 at 6:13 AM, Kais Ahmed  wrote:

> Hi all,
>
> Can anyone explain what mine deletedAt and localDeletion in
> SliceQueryFilter log.
>
> SliceQueryFilter.java (line 225) Read 6 live and 2688 tombstoned cells in
> ks.mytable (see tombstone_warn_threshold). 10 columns was requested,
> slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=
> 2147483647}
>
> Thanks,
>

-- 

Thanks,
Ryan Svihla

Re: Implications of ramping up max_hint_window_in_ms

2015-01-06 Thread Ryan Svihla

woops wrong thread..ignore that :) Robert is correct in this regard by and
large even though I disagree with the tradeoff, as my experience has shown
me, for a lot of use cases it's not a happy tradeoff, YMMV and there are
some that do exist (low write throughput).

On Tue, Jan 6, 2015 at 12:58 PM, Ryan Svihla  wrote:

> as long as they know how to handle node recovery and don't inflict return
> data back from the dead that was deleted.
>
> On Tue, Jan 6, 2015 at 12:52 PM, Robert Coli  wrote:
>
>> On Tue, Jan 6, 2015 at 7:39 AM, Ryan Svihla  wrote:
>>
>>> In general today, large amounts of hints still pretty much makes a node
>>> angry (just no longer nearly as nasty as it was before), unless you have a
>>> really low throughput, you're probably not going to gain much in practice
>>> by raising the hints window today.
>>>
>>
>> It gains people with not-insane write workload who don't mind eventual
>> consistency more time to respond to outages?
>>
>> =Rob
>>
>>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>

-- 

Thanks,
Ryan Svihla

Re: Question about `nodetool rebuild` finsh

2015-01-06 Thread Ryan Svihla

without more information it's hard to say what is the bottleneck. There
could be a great deal of gc traffic, it could be hung (some old streaming
bugs in some older versions of cassandra), it could be the disk io is
falling behind with your compaction of new sstables.



On Sun, Dec 28, 2014 at 9:00 PM, 李洛  wrote:

> Hi,every folks!
> I had meet a problem about adding data center to the exsiting cluster
> where rebuild it.
> When I configured all the new data center, auto_bootstrap:false, seeds,
> endpoint_snitch etc,I run _nodetool rebuild_, I can see there are high
> network traffic on my new data center nodes and _nodetool netstats_ report
> lots of Rebuild SSTable sending to my new nodes. But four days had gone,I
> see the _nodetool rebuild_ process still run in background on my server and
> _nodetool netstats_ still report lots of SSTable sending between two
> datacenter, but there is few network traffic on my new data center nodes.
> I want to konw _how could I konw when the rebuild finsh_.
> Thanks all for your reply.
>
> --
> All the best!
>
> http://luolee.me
>



-- 

Thanks,
Ryan Svihla

Re: Re: Cassandra update row after delete immediately, and read that, the data not right?

2015-01-06 Thread Ryan Svihla

so the coordinator node of a given request sets the timestamp unless
overridden by the client (which you can do on a per statement basis), while
you can move all of your timestamps to client side, eventually as you add
more clients you have a similar problem set and will still have to use NTP
to keep your clients in sync.

Ultimately for a large variety of reasons, it's probably better to just
make sure your cassandra nodes are in sync.


On Thu, Dec 25, 2014 at 10:26 PM,  wrote:

> Hi, all:
>
>The test program first insert one row and then delete it, then read it
> to compare.
>
>The test program run this flow row by row, not batch.
>
>
> Today I found the problem is caused by the deletion timestamp. The
> machine running the test program may not be time sync with cassandra
> machine strictly.
>
>
>
>  Why cassandra use the local timestamp for deletion??
>
>
>
> 发件人：Jack Krupansky 
> 收件人：user@cassandra.apache.org, yhq...@sina.com
> 主题：Re: Cassandra update row after delete immediately, and read that, the
> data not right?
> 日期：2014年12月25日 21点04分
>
> What RF?
>
> Is the update and read immediately after the delete and insert, or is the
> read after doing all the updates?
>
> Is the delete and insert done with a single batch?
>
> -- Jack Krupansky
>
> On Thu, Dec 25, 2014 at 4:14 AM,  wrote:
>
> Hi, all
>   I write a program to test the cassandra2.1. I have 6 nodes cluster.
>   First, I insert 1 million row data into cassandra. the row key from 1 to
> 100.
>
>   Then I run my test program. My test program first delete(use batch
> mutate) the row and insert (use batch mutate) that row,
>
>  then read (use gen_slice_range) the same row. After that check
> whether the read data is same with the insert data or not.
>   The consistency level used is quorum.
>
>   I found there some cases that not the same. About 1/1. In this error
> cases, some column is not same.
>
>   Then I use cassandra-cli to check the data, found that column is not
> exist. It seems insert partly.
>   My test program has 20 threads. the QPS 800 about
>
>   What's wrong with cassandra??
>
>
> Thanks!
>
>
>


-- 

Thanks,
Ryan Svihla

Re: Implications of ramping up max_hint_window_in_ms

2015-01-06 Thread Ryan Svihla

as long as they know how to handle node recovery and don't inflict return
data back from the dead that was deleted.

On Tue, Jan 6, 2015 at 12:52 PM, Robert Coli  wrote:

> On Tue, Jan 6, 2015 at 7:39 AM, Ryan Svihla  wrote:
>
>> In general today, large amounts of hints still pretty much makes a node
>> angry (just no longer nearly as nasty as it was before), unless you have a
>> really low throughput, you're probably not going to gain much in practice
>> by raising the hints window today.
>>
>
> It gains people with not-insane write workload who don't mind eventual
> consistency more time to respond to outages?
>
> =Rob
>
>


-- 

Thanks,
Ryan Svihla

Re: STCS limitation with JBOD?

2015-01-06 Thread Ryan Svihla

I would add that STC and JBOD are logically a bad fit anyway, and that
doing it with nodetool compact is extra silly. For this reasons I tend to
only use JBOD with LCS and therefore with SSD.

As far as modeling out tombstones, I tend to push towards more around the
model, for example if you're doing partitioning based on time, say daily,
for the sake of ease of understanding say you do Monday, Tuesday, Wed, etc,
but you only query the last 2 days. Soon as you have tables out of scope
you truncate that said table (on wednesday you can safely truncate Monday).
You can do this same approach with work queues ( truncate the work queue
when done), or really any logical model that has data that falls out of
scope. This can mean querying more than one table at a time, but if you do
this in an async fashion that tradeoff can totally be worth it compared to
managing tombstones, and really LCS does pin read times reasonably well
especially when compared to STC combined with compact (either you're
spiking read times on the compact, or your spiking beforehand because you
had a burst of write traffic prior to your nodetool compact run).

Details on my modeling these approaches here
http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

Finally,not typically a big fan of rewriting all data to a new table,
though I've done that for some models that were hard to partition (session
data that had variable times of aging out, so we pushed over the new
records ).

On Tue, Jan 6, 2015 at 12:12 PM, Dan Kinder  wrote:

> Thanks for the info guys. Regardless of the reason for using nodetool
> compact, it seems like the question still stands... but he impression I'm
> getting is that nodetool compact on JBOD as I described will basically fall
> apart. Is that correct?
>
> To answer Colin's question as an aside: we have a dataset with fairly high
> insert load and periodic range reads (batch processing). We have a
> situation where we may want rewrite some rows (changing the primary key) by
> deleting and inserting as a new row. This is not something we would do on a
> regular basis, but after or during the process a compact would greatly help
> to clear out tombstones/rewritten data.
>
> @Ryan Svihla it also sounds like your suggestion in this case would be:
> create a new column family, rewrite all data into that, truncate/remove the
> previous one, and replace it with the new one.
>
> On Tue, Jan 6, 2015 at 9:39 AM, Ryan Svihla  wrote:
>
>> nodetool compact is the ultimate "running with scissors" solution, far
>> more people manage to stab themselves in the eye. Customers running with
>> scissors successfully not withstanding.
>>
>> My favorite discussions usually tend to result:
>>
>>1. "We still have tombstones" ( so they set gc_grace_seconds to 0)
>>2. "We added a node after fixing it and now a bunch of records that
>>were deleted have come back" (usually after setting gc_grace_seconds to 0
>>and then not blanking nodes that have been offline)
>>3. Why are my read latencies so spikey?  (cause they're on STC and
>>now have a giant single huge SStable which worked fine when their data set
>>was tiny, now they're looking at 100 sstables on STC, which means
>>slwww reads)
>>4. "We still have tombstones" (yeah I know this again, but this is
>>usually when they've switched to LCS, which basically noops with nodetool
>>compact)
>>
>> All of this is managed when you have a team that understands the
>> tradeoffs of nodetool compact, but I categorically reject it's a good
>> experience for new users, as I've unfortunately had about dozen fire drills
>> this year as a result of nodetool compact alone.
>>
>> Data modeling around partitions that are truncated when falling out of
>> scope is typically far more manageable, works with any compaction strategy,
>> and doesn't require operational awareness at the same scale.
>>
>> On Fri, Jan 2, 2015 at 2:15 PM, Robert Coli  wrote:
>>
>>> On Fri, Jan 2, 2015 at 11:28 AM, Colin  wrote:
>>>
>>>> Forcing a major compaction is usually a bad idea.  What is your reason
>>>> for doing that?
>>>>
>>>
>>> I'd say "often" and not "usually". Lots of people have schema where they
>>> create way too much garbage, and major compaction can be a good response.
>>> The docs' historic incoherent FUD notwithstanding.
>>>
>>> =Rob
>>>
>>>
>>
>>
>>
>> --
>>
>> Thanks,
>> Ryan Svihla
>>
>>
>
>
> --
> Dan Kinder
> Senior Software Engineer
> Turnitin – www.turnitin.com
> dkin...@turnitin.com
>

-- 

Thanks,
Ryan Svihla

Re: Queries required before data modeling?

2015-01-06 Thread Ryan Svihla

Yes, however in most cases this means just one new table, so you make a new
table and copy the data over.  In many ways this is not unlike a schema
change, or if you need to change your primary key on an existing table in
traditional SQL databases.

This design around partition key is true of all databases once you go
distributed, and even when you start trying to scale out SQL databases you
have to think about problem sets like this. Whether your sharding your data
with Cassandra or doing it by hand in MySQL some key determines which data
is on which server.

If you really want to support dynamic queries you can use something like
Spark Sql to front end your data or index all the table ids with something
like Solr.  However, both of these approaches have performance implications
(they fan out and scan lots of data) and if you need Cassandra's speed and
scalability then you're going to need to model in a scalable way.

On Tue, Jan 6, 2015 at 11:47 AM, Srinivasa T N  wrote:

> Hi All,
>I was just googling around and reading the various articles on data
> modeling in cassandra.  All of them talk about working backwards, i.e.,
> first now what type of queries you are going to make and select a right
> data model which can support those queries efficiently.  But one thing I
> cannot understand: You can expect me that I can know some queries that I
> will be making but how can I know what all queries will be made before
> hand?  I have to remodel the whole stuff when I get a query which I had not
> thought off?
>
> Regards,
> Seenu.
>

-- 

Thanks,
Ryan Svihla

Re: Reload/resync system.peers table

2015-01-06 Thread Ryan Svihla

auto_bootstrap: false shouldn't help here any more than true.

So when I had this issue before in prod I've actually just executed delete
statements to the bogus nodes, this however only solved a symptom (the
ghosts came back) and the issue was a bug (
https://issues.apache.org/jira/browse/CASSANDRA-7122)

I just checked with the C* team at DataStax it appears this is a sane
approach (delete statement)

On Wed, Dec 31, 2014 at 3:04 PM, Robert Coli  wrote:

> On Wed, Dec 17, 2014 at 8:41 AM, Paulo Ricardo Motta Gomes <
> paulo.mo...@chaordicsystems.com> wrote:
>
>> Is there any automatic way of reloading/resyncing the system.peers table?
>> Or the only way is by removing ghost nodes?
>>
>
> You could delete its contents, drain, and then restart the node with
> auto_bootstrap:false, which should (?) rediscover only the live nodes.
>
> Probably best to be sure to specify initial_token here, though I'm not
> sure if that's just FUD talking...
>
> =Rob
>



-- 

Thanks,
Ryan Svihla

Re: STCS limitation with JBOD?

2015-01-06 Thread Ryan Svihla

nodetool compact is the ultimate "running with scissors" solution, far more
people manage to stab themselves in the eye. Customers running with
scissors successfully not withstanding.

My favorite discussions usually tend to result:

   1. "We still have tombstones" ( so they set gc_grace_seconds to 0)
   2. "We added a node after fixing it and now a bunch of records that were
   deleted have come back" (usually after setting gc_grace_seconds to 0 and
   then not blanking nodes that have been offline)
   3. Why are my read latencies so spikey?  (cause they're on STC and now
   have a giant single huge SStable which worked fine when their data set was
   tiny, now they're looking at 100 sstables on STC, which means slwww
   reads)
   4. "We still have tombstones" (yeah I know this again, but this is
   usually when they've switched to LCS, which basically noops with nodetool
   compact)

All of this is managed when you have a team that understands the tradeoffs
of nodetool compact, but I categorically reject it's a good experience for
new users, as I've unfortunately had about dozen fire drills this year as a
result of nodetool compact alone.

Data modeling around partitions that are truncated when falling out of
scope is typically far more manageable, works with any compaction strategy,
and doesn't require operational awareness at the same scale.

On Fri, Jan 2, 2015 at 2:15 PM, Robert Coli  wrote:

> On Fri, Jan 2, 2015 at 11:28 AM, Colin  wrote:
>
>> Forcing a major compaction is usually a bad idea.  What is your reason
>> for doing that?
>>
>
> I'd say "often" and not "usually". Lots of people have schema where they
> create way too much garbage, and major compaction can be a good response.
> The docs' historic incoherent FUD notwithstanding.
>
> =Rob
>
>

-- 

Thanks,
Ryan Svihla

Re: Can't connect to cassandra node from different host

2015-01-06 Thread Ryan Svihla

0.0.0.0 is usually not a good idea for a variety of reasons (though more
recent versions of the java driver appear to handle rpc_address fine).
Listen address should be what the cluster needs to listen on, and
rpc_address should be the ip that you're to connect to.

On Sat, Jan 3, 2015 at 11:52 PM, Chamila Wijayarathna <
cdwijayarat...@gmail.com> wrote:

> Thanks Jonathan,
>
> It worked after setting both listen address and rpc_address to 0.0.0.0
>
> On Sun, Jan 4, 2015 at 7:58 AM, Jonathan Haddad  wrote:
>
>> This is most likely because your listen address is set to localhost.  Try
>> changing it to listen on the external interface.
>>
>>
>> On Sat Jan 03 2015 at 10:03:57 AM Chamila Wijayarathna <
>> cdwijayarat...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I have a cassandra node at a machine. When I access cqlsh from the same
>>> machne it works properly.
>>>
>>> But when I tried to connect to it's cqlsh using "192.x.x.x" from another
>>> machine, I'm getting an error saying
>>>
>>> Connection error: ('Unable to connect to any servers', {'192.x.x.x':
>>> error(111, "Tried connecting to [('192.x.x.x', 9042)]. Last error:
>>> Connection refused")})
>>>
>>> What is the reason for this? How can I fix it?
>>>
>>> Thank You!
>>> --
>>> *Chamila Dilshan Wijayarathna,*
>>> SMIEEE, SMIESL,
>>> Undergraduate,
>>> Department of Computer Science and Engineering,
>>> University of Moratuwa.
>>>
>>
>
>
> --
> *Chamila Dilshan Wijayarathna,*
> SMIEEE, SMIESL,
> Undergraduate,
> Department of Computer Science and Engineering,
> University of Moratuwa.
>



-- 

Thanks,
Ryan Svihla

Re:

2015-01-06 Thread Ryan Svihla

Normal data modeling approach in Cassandra is a separate column family of
each of those queries is answerable with one partition key (that's going to
be the fastest).

I'm very suspicious of

   - get list of products for a given range of ids

Is this being driven by another query to get a list of ids? If so that
should probably be modeled differently and any query that would normally
return a list of ids should instead be modeled to produce a full product
(materialized views scale very well on any database and it's the common
approach on Cassandra)

On Mon, Jan 5, 2015 at 6:18 AM, Srinivasa T N  wrote:

> Just an arrow in the dark: Doucment "CQL for Cassandra 2.x Documentation"
> informs that cassandra allows to query on a column when it is indexed.
>
> Regards,
> Seenu.
>
> On Mon, Jan 5, 2015 at 5:14 PM, Nagesh  wrote:
>
>> Hi All,
>>
>> I have designed a column family
>>
>> prodgroup text, prodid int, status int, , PRIMARY KEY ((prodgroup),
>> prodid, status)
>>
>> The data model is to cater
>>
>>- Get list of products from the product group
>>- get list of products for a given range of ids
>>- Get details of a specific product
>>- Update status of the product acive/inactive
>>- Get list of products that are active or inactive (select * from
>>product where prodgroup='xyz' and prodid > 0 and status = 0)
>>
>> The design works fine, except for the last query . Cassandra not allowing
>> to query on status unless I fix the product id. I think defining a super
>> column family which has the key "PRIMARY KEY((prodgroup), staus,
>> productid)" should work. Would like to get expert advice on other
>> alternatives.
>> --
>> Thanks,
>> Nageswara Rao.V
>>
>> *"The LORD reigns"*
>>
>
>

-- 

Thanks,
Ryan Svihla

Re: Cassandra consuming whole RAM (64 G)

2015-01-06 Thread Ryan Svihla

That even with that patch you'll likely run heap pressure with batches of
that size, so either increase your heap and take the GC hit on CPU (and
have longer GCs) or don't use large batches.

The batch conversation is a bigger one which I discuss here
http://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/

On Tue, Jan 6, 2015 at 10:07 AM, Rahul Bhardwaj <
rahul.bhard...@indiamart.com> wrote:

> Thanks Ryan... We will keep ur valuable suggestion in resolving this
> issue.. But what is your take on Cassandra patch for issue 8248 to resolve
> this.
>
>
> On Tuesday, January 6, 2015, Ryan Svihla  wrote:
>
>> Btw side note here, you're using GIANT Batches, and the logs are
>> indicating such, this will cause a signficant amount of heap pressure.
>>
>> The root cause fix is not to use giant batches in the first place.
>>
>> On Tue, Jan 6, 2015 at 4:43 AM, Rahul Bhardwaj <
>> rahul.bhard...@indiamart.com> wrote:
>>
>>> Hi Joe..
>>>
>>> Thanks for your valuable solution.. it worked.
>>>
>>> But for this problem
>>>
>>> *The processes are killed by kernel, coz they are eating all memory
>>> (oom-killer). We have set JAVA heap to default (i.e. it is using 8G)
>>> because we have 64 GB RAM.*
>>>
>>> Should I apply patch given for  issue
>>> https://issues.apache.org/jira/browse/CASSANDRA-8248 ??
>>>
>>>
>>> regards:
>>> Rahul Bhardwaj
>>>
>>> On Tue, Jan 6, 2015 at 12:42 PM, Joe Ramsey  wrote:
>>>
>>>> Thanks Rahul and good luck!  I’m really curious to hear what the result
>>>> is.
>>>>
>>>>
>>>> On Jan 6, 2015, at 2:10 AM, Rahul Bhardwaj <
>>>> rahul.bhard...@indiamart.com> wrote:
>>>>
>>>> Thanks for your response.. i will get back to you with my findings.
>>>>
>>>> On Tue, Jan 6, 2015 at 12:36 PM, Joe Ramsey  wrote:
>>>>
>>>>> That should be “writing too many bytes” not “waiting too many bytes”
>>>>> just for clarity’s sake.
>>>>>
>>>>> On Jan 6, 2015, at 2:03 AM, Joe Ramsey  wrote:
>>>>>
>>>>> I’m not an expert.  Really just learning this myself but it looks like
>>>>> according to the stack you’re getting an exception waiting too many bytes
>>>>> to the commit log.
>>>>>
>>>>> That’s controlled by commit_log_segment_size_in_mb setting.  The
>>>>> maximum write size that C* will allow is half of the value set for this
>>>>> parameter so if it’s set for 32 (default) that means your max write would
>>>>> be 16MB.  (And that’s what’s getting reported in the stack trace.  You’re
>>>>> writing 16965030 bytes (16.18M) and your max write is 16MB so it’s 
>>>>> throwing
>>>>> the exception.  Try setting commit_log_segment_size_in_mb from 32 to 64 
>>>>> and
>>>>> see if the error goes away.  It really should get tuned for the amount of
>>>>> writes but that’ll tell you at least if this is the issue.
>>>>>
>>>>> Let me know how it goes!
>>>>>
>>>>> Joe
>>>>>
>>>>>
>>>>>
>>>>> On Jan 6, 2015, at 1:40 AM, Rahul Bhardwaj <
>>>>> rahul.bhard...@indiamart.com> wrote:
>>>>>
>>>>> Hi Joe,
>>>>>
>>>>> PFB output of system.log
>>>>>
>>>>>  tail -n 100 system.log
>>>>> INFO  [CompactionExecutor:164] 2015-01-06 11:58:28,555
>>>>> CompactionTask.java:251 - Compacted 4 sstables to
>>>>> [/var/lib/cassandra/data/clickstream/im_csl_log-22207f2081bb11e4abd4a9d4f1e0b940/clickstream-im_csl_log.im_csl_log_catalog_owner_glusr_id_idx-ka-243,].
>>>>>  50,514,079 bytes to 50,626,490 (~100% of original) in 654,529ms =
>>>>> 0.073765MB/s.  277,719 total partitions merged to 260,285.  Partition 
>>>>> merge
>>>>> counts were {1:246186, 2:11302, 3:2259, 4:538, }
>>>>> INFO  [CompactionExecutor:176] 2015-01-06 11:58:28,579
>>>>> CompactionTask.java:251 - Compacted 4 sstables to
>>>>> [/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-882,].
>>>>>  511 bytes to 42 (~8% of original) in 27ms = 0.001483MB/s.  7 total
>>>>> partitions merged to 1.  Pa

Re: Implications of ramping up max_hint_window_in_ms

2015-01-06 Thread Ryan Svihla

In general today, large amounts of hints still pretty much makes a node
angry (just no longer nearly as nasty as it was before), unless you have a
really low throughput, you're probably not going to gain much in practice
by raising the hints window today.

Later on when we get file system based hints in 3.0 I think your approach
will work better, today I'm concerned in practice larger hint windows won't
buy you a lot see the following for details.
http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery

On Tue, Jan 6, 2015 at 1:47 AM, Jens Rantil  wrote:

> Thanks for input, Rob. Just making sure, is "older version" the same as
> "less than version 2"?
>
>
>
> On Mon, Jan 5, 2015 at 8:13 PM, Robert Coli  wrote:
>
>> On Mon, Jan 5, 2015 at 2:52 AM, Jens Rantil  wrote:
>>
>>
>>> Since repair is a slow and daunting process*, I am considering
>>> increasing max_hint_window_in_ms from its default value of one (1) hour to
>>> something like 24-48 hours.
>>> ...
>>> Are there any other implications of making this change that I haven’t
>>> thought of?
>>>
>>
>> Not really, though 24-48 hours of hints could be an awful lot of hints. I
>> personally run with at least a 6 hour max_h_w_i_m.
>>
>> In older versions of Cassandra, 24-48 hours of hints could hose your node
>> via ineffective constant compaction.
>>
>> =Rob
>>
>
>

-- 

Thanks,
Ryan Svihla

Re: Is it possible to implement a interface to replace a row in cassandra using cassandra.thrift?

2015-01-06 Thread Ryan Svihla

replies inline

On Tue, Jan 6, 2015 at 2:28 AM,  wrote:

> Hi, all:
>
> I use cassandra.thrift to implement a replace row interface in this
> way:
>
> First use batch_mutate to delete that row, then use batch_mutate to
> insert a new row.
>
> I always find that after call this interface, the row is not exist.
>
>
> Then I doubt that it is the problem caused by the deletion, because
> the deleteion has a timestamp set by the client.
>
> Maybe the time is not so sync between the client and cassandra server
> (1 or more seconds diff).
>

It's a distributed database so time synchronization really really matters
so use NTP, however if you're using client side timestamps on both the
insert and the delete it's not going to matter for that use case

>
>
> How to solve this?? Is it possible to implement a  interface to
> replace a row in cassandra.???\
>

yeah all updates are this way. Inserts are actually "UPSERTS" and you can
go ahead and do two updates instead of insert, delete, update.

>
>
> Thanks.
>
>


-- 

Thanks,
Ryan Svihla

Re: ttl in collections

2015-01-06 Thread Ryan Svihla

Tombstone management is a big conversation, you can manage it in one of the
following ways

1) set a gc_grace_seconds of 0 and then run nodetool compact while using
size tiered compaction..as frequently as needed. This often is a pretty
lousy solution as gc_grace_seconds means you're not very partition tolerant
and it's easy to bring data back from the dead if you don't manage how you
bring nodes back online correctly. Also..nodetool compact is super
intensive. I don't recommend this approach unless you're already very
operationally sound.
2)Partition your data using a scheme that matches your domain model. It
sounds like you're using a queue approach and by and large  a distributed
database that relies on tombstones is going to struggle with that by
default. I have however, worked with a number of customers that use
cassandra for a queue at scale and I detailed the modeling workarounds here
http://lostechies.com/ryansvihla/2014/10/20/domain-modeling-around-deletes-or-using-cassandra-as-a-queue-even-when-you-know-better/

On Tue, Jan 6, 2015 at 4:24 AM, Jens-U. Mozdzen  wrote:

> Hi Eduardo,
>
> Zitat von Eduardo Cusa :
>
>>  [...]
>> I have to worry about the tombstones generated?  Considering that I will
>> have many daily set updates
>>
>
> that depends on your definition of "many"... we've run into a situation
> where we wanted to age out old data using TTL... unfortunately, we ran into
> the "tombstone_failure_threshold" limit rather quickly, having thousands of
> record updates per second. That left us with a CF containing millions of
> records that we couldn't "select" the way we originally intended.
>
> Regards,
> Jens
>
>

-- 

Thanks,
Ryan Svihla

Re: Cassandra consuming whole RAM (64 G)

2015-01-06 Thread Ryan Svihla

2)
>>> ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>>> Source) ~[na:1.7.0_71]
>>> at
>>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>>> ~[apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
>>> at
>>> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>>> [apache-cassandra-2.1.2.jar:2.1.2-SNAPSHOT]
>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_71]
>>>
>>>
>>>
>>> Regards:
>>> Rahul Bhardwaj
>>>
>>> On Tue, Jan 6, 2015 at 12:08 PM, Rahul Bhardwaj <
>>> rahul.bhard...@indiamart.com> wrote:
>>>
>>>> Hi Joe,
>>>>
>>>> PFA heap dump
>>>>
>>>>
>>>> regards:
>>>> Rahul Bhardwaj
>>>>
>>>>
>>>>
>>>> On Tue, Jan 6, 2015 at 11:35 AM, Joe Ramsey  wrote:
>>>>
>>>>> Did you try generating a heap dump so you can look through it to see
>>>>> what’s actually happened?
>>>>>
>>>>>
>>>>> On Jan 6, 2015, at 12:58 AM, Rahul Bhardwaj <
>>>>> rahul.bhard...@indiamart.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We are using cassandra 2.1 version in a cluster of three machines each
>>>>> with 64 GB RAM
>>>>>
>>>>> The processes are killed by kernel, coz they are eating all memory
>>>>> (oom-killer). We have set JAVA heap to default (i.e. it is using 8G)
>>>>> because we have 64 GB RAM.
>>>>>
>>>>> Please help.
>>>>>
>>>>>
>>>>> Regards:
>>>>> Rahul Bhardwaj
>>>>>
>>>>>
>>>>> Follow IndiaMART.com <http://www.indiamart.com/> for latest updates
>>>>> on this and more: <https://plus.google.com/+indiamart>
>>>>> <https://www.facebook.com/IndiaMART> <https://twitter.com/IndiaMART>
>>>>> Mobile Channel:
>>>>> <https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641&mt=8>
>>>>> <https://play.google.com/store/apps/details?id=com.indiamart.m>
>>>>> <http://m.indiamart.com/>
>>>>>
>>>>> <https://www.youtube.com/watch?v=DzORNbeSXN8&list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1&index=2>
>>>>> Watch how Irrfan Khan gets his work done in no time on IndiaMART,
>>>>> kyunki Kaam Yahin Banta Hai
>>>>> <https://www.youtube.com/watch?v=hmS4Afl2bNU>!!!
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> Follow IndiaMART.com <http://www.indiamart.com/> for latest updates on
>>> this and more: <https://plus.google.com/+indiamart>
>>> <https://www.facebook.com/IndiaMART> <https://twitter.com/IndiaMART>
>>> Mobile Channel:
>>> <https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641&mt=8>
>>> <https://play.google.com/store/apps/details?id=com.indiamart.m>
>>> <http://m.indiamart.com/>
>>>
>>> <https://www.youtube.com/watch?v=DzORNbeSXN8&list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1&index=2>
>>> Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki 
>>> Kaam
>>> Yahin Banta Hai <https://www.youtube.com/watch?v=hmS4Afl2bNU>!!!
>>>
>>>
>>>
>>>
>>
>>
>> Follow IndiaMART.com <http://www.indiamart.com/> for latest updates on
>> this and more: <https://plus.google.com/+indiamart>
>> <https://www.facebook.com/IndiaMART> <https://twitter.com/IndiaMART>
>> Mobile Channel:
>> <https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641&mt=8>
>> <https://play.google.com/store/apps/details?id=com.indiamart.m>
>> <http://m.indiamart.com/>
>>
>> <https://www.youtube.com/watch?v=DzORNbeSXN8&list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1&index=2>
>> Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam
>> Yahin Banta Hai <https://www.youtube.com/watch?v=hmS4Afl2bNU>!!!
>>
>>
>>
>
>
> Follow IndiaMART.com <http://www.indiamart.com> for latest updates on
> this and more: <https://plus.google.com/+indiamart>
> <https://www.facebook.com/IndiaMART> <https://twitter.com/IndiaMART>
> Mobile Channel:
> <https://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=668561641&mt=8>
> <https://play.google.com/store/apps/details?id=com.indiamart.m>
> <http://m.indiamart.com/>
>
> <https://www.youtube.com/watch?v=DzORNbeSXN8&list=PL2o4J51MqpL0mbue6kzDa6eymLVUXtlR1&index=2>
> Watch how Irrfan Khan gets his work done in no time on IndiaMART, kyunki Kaam
> Yahin Banta Hai <https://www.youtube.com/watch?v=hmS4Afl2bNU>!!!
>



-- 

Thanks,
Ryan Svihla

Re: CQL3 vs Thrift

2014-12-24 Thread Ryan Svihla

Peter,

Can you come up with some specifics? I'm always interested in finding more
corner cases, but it's also possible I have a modeling alternative that you
may not have considered yet, regardless it's good practice and background
for me.

On Tue, Dec 23, 2014 at 12:26 PM, Peter Lin  wrote:

>
> I'm bias in favor of using both thrift and CQL3, though many people on the
> list probably think I'm crazy.
>
> CQL3 is good if what you need fits nicely in static columns, but it
> doesn't if you want to use dynamic columns and/or mix & match both in the
> same columnFamily. For a lot of what I use Cassandra for, CQL3 currently
> doesn't provide all the functionality. It is possible to extend CQL3
> further to make it handle 100% of the use cases that Thrift supports today.
>
> whether that will happen is anyone's guess. SQL "like" syntax is popular
> and many people understand it, but it doesn't necessarily line up perfectly
> with NoSql column databases.
>
>
> On Tue, Dec 23, 2014 at 1:00 PM, David Broyles 
> wrote:
>
>> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
>> get me most of what I need.  I think the only potential inefficiency  is
>> still at query time.  Using Thrift, I could design the column family to get
>> the all the static and dynamic content in a single query.
>> If event_source and total_events are instead implemented as CQL3 statics,
>> I probably need to do two queries to get data for a given event_type
>>
>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
>> SELECT event_source, total_events FROM timeseries WHERE event_type =
>> 'some-type'
>>
>> To get the events:
>> SELECT insertion_time, event FROM timeseries
>>
>> As a combined query, my concern is related to the overhead of repeating
>> event_type/source/total_events (although with potentially many other pieces
>> of static information).
>>
>> More generally, do you find that tuned applications tend to use Thrift, a
>> combination of Thrift and CQL3, or is CQL3 really expected to replace
>> Thrift?
>>
>> Thanks again!
>>
>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla 
>> wrote:
>>
>>> Don't static columns get you what you want?
>>>
>>>
>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>>  On Dec 22, 2014 10:50 PM, "David Broyles"  wrote:
>>>
>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>>> new projects should use CQL3.
>>>>
>>>> I'm wondering, however, if there are certain use cases not well covered
>>>> by CQL3.  Consider the standard timeseries example:
>>>>
>>>> CREATE TABLE timeseries (
>>>>event_type text,
>>>>insertion_time timestamp,
>>>>event blob,
>>>>PRIMARY KEY (event_type, insertion_time)
>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>
>>>> What happens if I want to store additional information that is shared
>>>> by all events in the given series (but that I don't want to include in the
>>>> row ID): e.g. the event source, a cached count of the number of events
>>>> logged to date, etc.?  I might try updating the definition as follows:
>>>>
>>>> CREATE TABLE timeseries (
>>>>event_type text,
>>>>   event_source text,
>>>>total_events int,
>>>>insertion_time timestamp,
>>>>event blob,
>>>>PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>
>>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>>>> times?  Please let me know if I'm misunderstanding something, or if I
>>>> should be sticking to Thrift for situations like this involving mixed
>>>> static/dynamic data.
>>>>
>>>> Thanks!
>>>>
>>>
>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: CQL3 vs Thrift

2014-12-24 Thread Ryan Svihla

I'm not entirely certain how you can't model that to solve your use case
(wouldn't you be filtering the events as well, and therefore be able to get
all that in one query).

 What you describe there has a number of avenues (collections, just heavier
use of statics in a different order than you specified, object dump of
events in a single column, switching up the clustering columns) of getting
your question answered in one query. End of the day cql resolves to a given
SStable format, you can still open up cassandra-cli and view what a given
model looks like, when you've grokked this adequately you basically can
bend CQL to fit your logical thrift modeling, at some point like learning
any new language you'll learn to speak in both ( something I have to do
nearly daily).

FWIW other than the primary valid complaint remaining for Thrift over CQL
is modeling clustering columns in different nesting between rows is trivial
in Thrift and not really doable in CQL (clustering columns enforce a
nesting order by logical construct), I've yet to not be able to swap a
client from thrift to CQL ,and it's always ended up faster (so far).

The main reason for this is performance on modern Cassandra and the native
protocol is substantially better than pure thrift for many query types (see
http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , so
your mileage may vary, but I'd test it out first before proclaiming that
thrift is faster for your use case (and make liberal use of cql features
with cassandra-cli to make sure you know what's going on internally,
remember it's all just sstables underneath).

On Tue, Dec 23, 2014 at 12:00 PM, David Broyles 
wrote:

> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
> get me most of what I need.  I think the only potential inefficiency  is
> still at query time.  Using Thrift, I could design the column family to get
> the all the static and dynamic content in a single query.
> If event_source and total_events are instead implemented as CQL3 statics,
> I probably need to do two queries to get data for a given event_type
>
> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
> SELECT event_source, total_events FROM timeseries WHERE event_type =
> 'some-type'
>
> To get the events:
> SELECT insertion_time, event FROM timeseries
>
> As a combined query, my concern is related to the overhead of repeating
> event_type/source/total_events (although with potentially many other pieces
> of static information).
>
> More generally, do you find that tuned applications tend to use Thrift, a
> combination of Thrift and CQL3, or is CQL3 really expected to replace
> Thrift?
>
> Thanks again!
>
> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla  wrote:
>
>> Don't static columns get you what you want?
>>
>>
>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>  On Dec 22, 2014 10:50 PM, "David Broyles"  wrote:
>>
>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>> new projects should use CQL3.
>>>
>>> I'm wondering, however, if there are certain use cases not well covered
>>> by CQL3.  Consider the standard timeseries example:
>>>
>>> CREATE TABLE timeseries (
>>>event_type text,
>>>insertion_time timestamp,
>>>event blob,
>>>PRIMARY KEY (event_type, insertion_time)
>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>
>>> What happens if I want to store additional information that is shared by
>>> all events in the given series (but that I don't want to include in the row
>>> ID): e.g. the event source, a cached count of the number of events logged
>>> to date, etc.?  I might try updating the definition as follows:
>>>
>>> CREATE TABLE timeseries (
>>>event_type text,
>>>   event_source text,
>>>total_events int,
>>>insertion_time timestamp,
>>>event blob,
>>>PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>
>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>>> times?  Please let me know if I'm misunderstanding something, or if I
>>> should be sticking to Thrift for situations like this involving mixed
>>> static/dynamic data.
>>>
>>> Thanks!
>>>
>>
>

-

Re: [Cassandra] [Generation of SStableLoader slow]

2014-12-24 Thread Ryan Svihla

I doubt it there are huge gains with tinkering if adding more CPU speeds
the things up, that indicates you're resource bound. It's over a VM, it's
probably a slow underlying disk, there is just physics at some point. You
can try playing with using the java client instead of the sstableloader but
I doubt that will actually be faster for your particular use case.

On Wed, Dec 24, 2014 at 7:05 AM, 严超  wrote:

> Yes, I think so too. Plus, I used VM with 4 CPUs and 2 CPUs, and 4CPUs
> really did faster.
> But It took 1 hour to generate sstable for 1G csv. I am wondering if there
> is other way to make it faster except adding CPUs and ram.
>
> *Best Regards！*
>
>
> *Chao Yan--**My twitter：Andy Yan @yanchao727
> <https://twitter.com/yanchao727>*
>
>
> *My Weibo：http://weibo.com/herewearenow
> <http://weibo.com/herewearenow>--*
>
> 2014-12-24 20:40 GMT+08:00 Ryan Svihla :
>
>> I think that'd be slow copying large files with just the cp command.
>> Cassandra isn't doing anything amazingly strange here, you don't have a lot
>> of RAM, nor CPU and I'm assuming the underlying disk is slow here as well.
>> Without more parameters and details it's hard to define if there is an
>> issue.
>>
>> On Dec 24, 2014 7:36 AM, "严超"  wrote:
>>
>>> Hi, Everyone:
>>>
>>> I'm importing a CSV file into Cassandra using SStableLoader. And I'm
>>> following the example here:
>>> https://github.com/yukim/cassandra-bulkload-example/
>>>
>>> But, Even though the streaming of SSTables is very fast , I find that
>>> generation of SStables is quite slow for very large files (CSV, 4GB+). I am
>>> using a Dual Core computer with 2 GB ram. Could it be because of the system
>>> spec or any other factor?
>>>
>>> Thank you for any advice.
>>>
>>> *Best Regards！*
>>>
>>>
>>> *Chao Yan--**My twitter：Andy Yan @yanchao727
>>> <https://twitter.com/yanchao727>*
>>>
>>>
>>> *My Weibo：http://weibo.com/herewearenow
>>> <http://weibo.com/herewearenow>--*
>>>
>>
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Tombstones without DELETE

2014-12-24 Thread Ryan Svihla

You should probably ask on the Cassandra user mailling list.

However, TTL is the only other case I can think of.

On Tue, Dec 23, 2014 at 1:36 PM, Davide D'Agostino  wrote:

> Hi there,
>
> Following this:
> https://groups.google.com/a/lists.datastax.com/forum/#!searchin/java-driver-user/tombstone/java-driver-user/cHE3OOSIXBU/moLXcif1zQwJ
>
> Under what conditions Cassandra generates a tombstone?
>
> Basically I have not even big table on cassandra (90M rows) in my code
> there is no delete and I use prepared statements (but binding all necessary
> values).
>
> I'm aware that a tombstone gets created when:
>
> 1. You delete the row
> 2. You set a column to null while previously it had a value
> 3. When you use prepared statements and you don't bind all the values
>
> Anything else that I should be aware of?
>
> Thanks!
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-user+unsubscr...@lists.datastax.com.
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: 答复:

2014-12-24 Thread Ryan Svihla

Every time I've heard this but one this has been clock skew  (and that was
swallowed exceptions), however it can just be you have a test that is prone
to race conditions (delete followed by an immediate select with a low
consistency level), without more detail it's hard to say.

I'd check the nodes for time skew by running ntpdate on each node, and make
sure ntpd is pointing to the same servers.

On Wed, Dec 24, 2014 at 2:53 AM, 鄢来琼  wrote:

>  Yeah, I also have the question.
>
> My solution is not delete the row, but insert the right row to a new table.
>
>
>
> Thanks & Regards,
>
> *Peter YAN*
>
>
>
> *发件人:* Sávio S. Teles de Oliveira [mailto:savio.te...@cuia.com.br]
> *发送时间:* 2014年8月26日 4:25
> *收件人:* user@cassandra.apache.org
> *主题:*
>
>
>
> We're using cassandra 2.0.9 with datastax java cassandra driver 2.0.0 in a
> cluster of eight nodes.
>
> We're doing an insert and after a delete like:
>
> delete from *column_family_name* where *id* = value
>
> Immediatly select to check whether the DELETE was successful. Sometimes
> the value still there!!
>
>
>
> Any suggestions?
>
> --
>
> Atenciosamente,
> Sávio S. Teles de Oliveira
>
> voice: +55 62 9136 6996
> http://br.linkedin.com/in/savioteles
>
> Mestrando em Ciências da Computação - UFG
> Arquiteto de Software
>
> CUIA Internet Brasil
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: [Merging data from memtables and 1 sstables] takes too much time.

2014-12-24 Thread Ryan Svihla

Is the underlying disk spinning disk? Because that'd be about right for a
cold read (non cached), the fast reads would likely be in buffer cache or
just pure memtable reads.

On Wed, Dec 24, 2014 at 5:32 AM, nitin padalia 
wrote:

> Is merging costly operation with wide rows?
> On Dec 10, 2014 5:53 PM, "nitin padalia"  wrote:
>
>> I am using a schema like below:
>>
>> CREATE TABLE user_location_map (
>> store_id uuid,
>> location_id uuid,
>> user_serial_number text,
>> userobjectid uuid,
>> PRIMARY KEY ((store_id, location_id), user_serial_number)
>> ) WITH CLUSTERING ORDER BY (user_serial_number ASC)
>> AND bloom_filter_fp_chance = 0.01
>> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>> AND comment = ''
>> AND compaction = {'min_threshold': '4', 'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32'}
>> AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99.0PERCENTILE';
>>
>> Where I run a query like:
>> select * from  user_location_map where store_id =
>> 17b73358-79e6-11e4-bfd4-0050568aa211 and location_id =
>> 2c269ea4-dbfd-32dd-9bd7-a5c22677d18b and user_serial_number =
>> 'uI2201';
>>
>> some times queries like above complete in 3-4 milliseconds, however
>> few times they take around 80-90 milliseconds. The data is around 1
>> million distributed in 5 nodes with RF 3.
>>
>> Tacing shows every time most time is consumed by:
>> Merging data from memtables and 1 sstables
>>
>> What could the reason that some times this take too long, however rest
>> of the time its fast.
>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: [Cassandra] [Generation of SStableLoader slow]

2014-12-24 Thread Ryan Svihla

I think that'd be slow copying large files with just the cp command.
Cassandra isn't doing anything amazingly strange here, you don't have a lot
of RAM, nor CPU and I'm assuming the underlying disk is slow here as well.
Without more parameters and details it's hard to define if there is an
issue.

On Dec 24, 2014 7:36 AM, "严超"  wrote:

> Hi, Everyone:
>
> I'm importing a CSV file into Cassandra using SStableLoader. And I'm
> following the example here:
> https://github.com/yukim/cassandra-bulkload-example/
>
> But, Even though the streaming of SSTables is very fast , I find that
> generation of SStables is quite slow for very large files (CSV, 4GB+). I am
> using a Dual Core computer with 2 GB ram. Could it be because of the system
> spec or any other factor?
>
> Thank you for any advice.
>
> *Best Regards！*
>
>
> *Chao Yan--**My twitter：Andy Yan @yanchao727
> *
>
>
> *My Weibo：http://weibo.com/herewearenow
> --*
>

Re: CQL3 vs Thrift

2014-12-22 Thread Ryan Svihla

Don't static columns get you what you want?

http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
 On Dec 22, 2014 10:50 PM, "David Broyles"  wrote:

> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages such
> as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest new
> projects should use CQL3.
>
> I'm wondering, however, if there are certain use cases not well covered by
> CQL3.  Consider the standard timeseries example:
>
> CREATE TABLE timeseries (
>event_type text,
>insertion_time timestamp,
>event blob,
>PRIMARY KEY (event_type, insertion_time)
> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>
> What happens if I want to store additional information that is shared by
> all events in the given series (but that I don't want to include in the row
> ID): e.g. the event source, a cached count of the number of events logged
> to date, etc.?  I might try updating the definition as follows:
>
> CREATE TABLE timeseries (
>event_type text,
>   event_source text,
>total_events int,
>insertion_time timestamp,
>event blob,
>PRIMARY KEY (event_type, event_source, total_events, insertion_time)
> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>
> Is this not inefficient?  When inserting or querying via CQL3, say in
> batches of up to 1000 events, won't the type/source/count be repeated 1000
> times?  Please let me know if I'm misunderstanding something, or if I
> should be sticking to Thrift for situations like this involving mixed
> static/dynamic data.
>
> Thanks!
>

Re: Store counter with non-counter column in the same column family?

2014-12-22 Thread Ryan Svihla

Spark can count a regular table. Spark sql would be the easiest thing to
get started with most likely.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md

Go down to the spark sql section to get some idea of the ease of use.
On Dec 22, 2014 10:00 PM, "ziju feng"  wrote:

> Thanks for the advise, I'll definitely take a look at how Spark works and
> how it can help with counting.
>
> One last question: My current implementation of counting is 1) increment
> counter 2) read counter immediately after the write 3) write counts to
> multiple tables for different query paths and solr. If I switch to Spark,
> do I still needs to use counter or counting will be done by spark on
> regular table?
>
> On Tue, Dec 23, 2014 at 11:31 AM, Ryan Svihla 
> wrote:
>
>> increment wouldn't be idempotent from the client unless you knew the
>> count at the time of the update (which you could do with LWT but that has
>> pretty harsh performance), that particular jira is about how they're laid
>> out and avoiding race conditions between nodes, which was resolved in 2.1
>> beta 1 (which is now in officially out in the 2.1.x branch)
>>
>> General improvements on counters in 2.1 are laid out here
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>>
>> As for best practice the answer is multiple tables for multiple query
>> paths, or you can use something like solr or spark, take a look at the
>> spark cassandra connector for a good way to count on lots of data from lots
>> of different query paths
>> https://github.com/datastax/spark-cassandra-connector.
>>
>>
>>
>> On Mon, Dec 22, 2014 at 9:22 PM, ziju feng  wrote:
>>
>>> I just skimmed through JIRA
>>> <https://issues.apache.org/jira/browse/CASSANDRA-4775> and it seems
>>> there has been some effort to make update idempotent. Perhaps the problem
>>> can be fixed in the near future?
>>>
>>> Anyway, what is the current best practice for such use case? (Counting
>>> and displaying counts in different queries) I don't need a 100% accurate
>>> count and strong consistency. Performance and application complexity is my
>>> main concern.
>>>
>>> Thanks
>>>
>>> On Mon, Dec 22, 2014 at 10:37 PM, Ryan Svihla 
>>> wrote:
>>>
>>>> You can cheat it by using the non counter column as part of your
>>>> primary key (clustering column specifically) but the cases where this could
>>>> work are limited and the places this is a good idea are even more rare.
>>>>
>>>> As for using counters in batches are already a not well regarded
>>>> concept and counter batches have a number of troubling behaviors, as
>>>> already stated increments aren't idempotent and batch implies retry.
>>>>
>>>> As for DSE search its doing something drastically different internally
>>>> and the type of counting its doing is many orders of magnitude faster (
>>>> think bitmask style matching + proper async 2i to minimize fanout cost)
>>>>
>>>> Generally speaking counting accurately while being highly available
>>>> creates an interesting set of logical tradeoffs. Example what do you do if
>>>> you're not able to communicate between two data centers, but both are up
>>>> and serving "likes" quite happily? Is your counting down? Do you keep
>>>> counting but serve up different answers? More accurately since problems are
>>>> rarely data center to data center but more frequently between replicas, how
>>>> much availability are you willing to give up in exchange for a globally
>>>> accurate count?
>>>> On Dec 22, 2014 6:00 AM, "DuyHai Doan"  wrote:
>>>>
>>>>> It's not possible to mix counter and non counter columns because
>>>>> currently the semantic of counter is only increment/decrement (thus NOT
>>>>> idempotent) and requires some special handling compared to other C* 
>>>>> columns.
>>>>>
>>>>> On Mon, Dec 22, 2014 at 11:33 AM, ziju feng 
>>>>> wrote:
>>>>>
>>>>>> I was wondering if there is plan to allow creating counter column
>>>>>> and standard column in the same table.
>>>>>>
>>>>>> Here is my use case:
>>>>>> I want to use counter to count how many users like a given item in my
>>>>>> application. The like co

Re: Store counter with non-counter column in the same column family?

2014-12-22 Thread Ryan Svihla

increment wouldn't be idempotent from the client unless you knew the count
at the time of the update (which you could do with LWT but that has pretty
harsh performance), that particular jira is about how they're laid out and
avoiding race conditions between nodes, which was resolved in 2.1 beta 1
(which is now in officially out in the 2.1.x branch)

General improvements on counters in 2.1 are laid out here
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

As for best practice the answer is multiple tables for multiple query
paths, or you can use something like solr or spark, take a look at the
spark cassandra connector for a good way to count on lots of data from lots
of different query paths
https://github.com/datastax/spark-cassandra-connector.



On Mon, Dec 22, 2014 at 9:22 PM, ziju feng  wrote:

> I just skimmed through JIRA
> <https://issues.apache.org/jira/browse/CASSANDRA-4775> and it seems there
> has been some effort to make update idempotent. Perhaps the problem can be
> fixed in the near future?
>
> Anyway, what is the current best practice for such use case? (Counting and
> displaying counts in different queries) I don't need a 100% accurate count
> and strong consistency. Performance and application complexity is my main
> concern.
>
> Thanks
>
> On Mon, Dec 22, 2014 at 10:37 PM, Ryan Svihla 
> wrote:
>
>> You can cheat it by using the non counter column as part of your primary
>> key (clustering column specifically) but the cases where this could work
>> are limited and the places this is a good idea are even more rare.
>>
>> As for using counters in batches are already a not well regarded concept
>> and counter batches have a number of troubling behaviors, as already stated
>> increments aren't idempotent and batch implies retry.
>>
>> As for DSE search its doing something drastically different internally
>> and the type of counting its doing is many orders of magnitude faster (
>> think bitmask style matching + proper async 2i to minimize fanout cost)
>>
>> Generally speaking counting accurately while being highly available
>> creates an interesting set of logical tradeoffs. Example what do you do if
>> you're not able to communicate between two data centers, but both are up
>> and serving "likes" quite happily? Is your counting down? Do you keep
>> counting but serve up different answers? More accurately since problems are
>> rarely data center to data center but more frequently between replicas, how
>> much availability are you willing to give up in exchange for a globally
>> accurate count?
>> On Dec 22, 2014 6:00 AM, "DuyHai Doan"  wrote:
>>
>>> It's not possible to mix counter and non counter columns because
>>> currently the semantic of counter is only increment/decrement (thus NOT
>>> idempotent) and requires some special handling compared to other C* columns.
>>>
>>> On Mon, Dec 22, 2014 at 11:33 AM, ziju feng  wrote:
>>>
>>>> I was wondering if there is plan to allow creating counter column and
>>>> standard column in the same table.
>>>>
>>>> Here is my use case:
>>>> I want to use counter to count how many users like a given item in my
>>>> application. The like count needs to be returned along with details of item
>>>> in query. To support querying items in different ways, I use both
>>>> application-maintained denormalized index tables and DSE search for
>>>> indexing. (DSE search is also used for text searching)
>>>>
>>>> Since current counter implementation doesn't allow having counter
>>>> columns and non-counter columns in the same table, I have to propagate the
>>>> current count from counter table to the main item table and index tables,
>>>> so that like counts can be returned by those index tables without sending
>>>> extra requests to counter table and DSE search is able to build index on
>>>> like count column in the main item table to support like count related
>>>> queries (such as sorting by like count).
>>>>
>>>> IMHO, the only way to sync data between counter table and normal table
>>>> within a reasonable time (sub-seconds) currently is to read the current
>>>> value from counter table right after the update. However it suffers from
>>>> several issues:
>>>> 1. Read-after-write may not return the correct count when replication
>>>> factor > 1 unless consistency level ALL/LOCAL_ALL is used
>>>> 2. There are two extra non-parall

Re: Connect to C* instance inside virtualbox

2014-12-22 Thread Ryan Svihla

so what is the IP address of that interface? attempt to use cqlsh with
whatever that address is, otherwise it will default to localhost.

On Mon, Dec 22, 2014 at 8:55 PM, Kai Wang  wrote:

> I might misread the comment but I thought I could only set rpc_interface
> or rpc_address but not both. So I didn't set rpc_addresa. Will double check
> tomorrow. Thanks.
> On Dec 22, 2014 9:17 PM, "Ryan Svihla"  wrote:
>
>> if this helps..what did you change rpc_address to?
>>
>> On Mon, Dec 22, 2014 at 8:15 PM, Ryan Svihla 
>> wrote:
>>
>>> right that's localhost, you have to change it to match the ip of
>>> whatever you changed rpc_address too
>>>
>>> On Mon, Dec 22, 2014 at 8:07 PM, Kai Wang  wrote:
>>>
>>>> on the guest where C* is installed, I run cqlsh without any argument.
>>>> When I enabled rpc_interface, cqlsh returned can't connect
>>>> 127.0.0.1:9042.
>>>>
>>>> On Mon, Dec 22, 2014 at 9:01 PM, Ryan Svihla 
>>>> wrote:
>>>>
>>>>> totally depends on how the implementation is handled in virtualbox,
>>>>> I'm assuming you're connecting to an IP that makes sense on the guest (ie
>>>>> nodetool -h 192.168.1.100 and cqlsh 192.168.1.100, replace that ip with
>>>>> whatever what you expect)?
>>>>>
>>>>> On Mon, Dec 22, 2014 at 7:58 PM, Kai Wang  wrote:
>>>>>
>>>>>> Ryan,
>>>>>>
>>>>>> Actually after I made the change, I was able to connect to C* from
>>>>>> host but not from guest anymore. Is this expected?
>>>>>>
>>>>>> On Mon, Dec 22, 2014 at 8:53 PM, Kai Wang  wrote:
>>>>>>
>>>>>>> Ryan,
>>>>>>>
>>>>>>> it works! I saw this new config mentioned in Cassandra summit 2014
>>>>>>> but didn't realize it applied in my case.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> On Mon, Dec 22, 2014 at 4:43 PM, Ryan Svihla 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> what is rpc_address set to in cassandra.yaml? my gut is localhost,
>>>>>>>> set it to the interface that communicates between host and guest.
>>>>>>>>
>>>>>>>> On Mon, Dec 22, 2014 at 3:38 PM, Kai Wang  wrote:
>>>>>>>>
>>>>>>>>> I installed C* in virtualbox via vagrant. Both 9160 and 9042 ports
>>>>>>>>> are forwarded from guest to host. I can telnet to those two ports 
>>>>>>>>> from host
>>>>>>>>> to guest. But from my host, I can't connect to C* using cassandra-cli 
>>>>>>>>> or
>>>>>>>>> cqlsh. My host is Windows 7 64bit and guest is CentOS 6.5.
>>>>>>>>>
>>>>>>>>> Is there anything special about connecting to a C* instance inside
>>>>>>>>> virtualbox?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>>>
>>>>>>>> Ryan Svihla
>>>>>>>>
>>>>>>>> Solution Architect
>>>>>>>>
>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>>>
>>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>>>> DataStax
>>>>>>>> is the database technology and transactional backbone of choice for the
>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>>>> eBay.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>

Re: Connect to C* instance inside virtualbox

2014-12-22 Thread Ryan Svihla

right that's localhost, you have to change it to match the ip of whatever
you changed rpc_address too

On Mon, Dec 22, 2014 at 8:07 PM, Kai Wang  wrote:

> on the guest where C* is installed, I run cqlsh without any argument. When
> I enabled rpc_interface, cqlsh returned can't connect 127.0.0.1:9042.
>
> On Mon, Dec 22, 2014 at 9:01 PM, Ryan Svihla  wrote:
>
>> totally depends on how the implementation is handled in virtualbox, I'm
>> assuming you're connecting to an IP that makes sense on the guest (ie
>> nodetool -h 192.168.1.100 and cqlsh 192.168.1.100, replace that ip with
>> whatever what you expect)?
>>
>> On Mon, Dec 22, 2014 at 7:58 PM, Kai Wang  wrote:
>>
>>> Ryan,
>>>
>>> Actually after I made the change, I was able to connect to C* from host
>>> but not from guest anymore. Is this expected?
>>>
>>> On Mon, Dec 22, 2014 at 8:53 PM, Kai Wang  wrote:
>>>
>>>> Ryan,
>>>>
>>>> it works! I saw this new config mentioned in Cassandra summit 2014 but
>>>> didn't realize it applied in my case.
>>>>
>>>> Thanks.
>>>>
>>>> On Mon, Dec 22, 2014 at 4:43 PM, Ryan Svihla 
>>>> wrote:
>>>>
>>>>> what is rpc_address set to in cassandra.yaml? my gut is localhost, set
>>>>> it to the interface that communicates between host and guest.
>>>>>
>>>>> On Mon, Dec 22, 2014 at 3:38 PM, Kai Wang  wrote:
>>>>>
>>>>>> I installed C* in virtualbox via vagrant. Both 9160 and 9042 ports
>>>>>> are forwarded from guest to host. I can telnet to those two ports from 
>>>>>> host
>>>>>> to guest. But from my host, I can't connect to C* using cassandra-cli or
>>>>>> cqlsh. My host is Windows 7 64bit and guest is CentOS 6.5.
>>>>>>
>>>>>> Is there anything special about connecting to a C* instance inside
>>>>>> virtualbox?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>
>>>>> Ryan Svihla
>>>>>
>>>>> Solution Architect
>>>>>
>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>
>>>>> DataStax is the fastest, most scalable distributed database
>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>> DataStax
>>>>> is the database technology and transactional backbone of choice for the
>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Connect to C* instance inside virtualbox

2014-12-22 Thread Ryan Svihla

if this helps..what did you change rpc_address to?

On Mon, Dec 22, 2014 at 8:15 PM, Ryan Svihla  wrote:

> right that's localhost, you have to change it to match the ip of whatever
> you changed rpc_address too
>
> On Mon, Dec 22, 2014 at 8:07 PM, Kai Wang  wrote:
>
>> on the guest where C* is installed, I run cqlsh without any argument.
>> When I enabled rpc_interface, cqlsh returned can't connect 127.0.0.1:9042
>> .
>>
>> On Mon, Dec 22, 2014 at 9:01 PM, Ryan Svihla 
>> wrote:
>>
>>> totally depends on how the implementation is handled in virtualbox, I'm
>>> assuming you're connecting to an IP that makes sense on the guest (ie
>>> nodetool -h 192.168.1.100 and cqlsh 192.168.1.100, replace that ip with
>>> whatever what you expect)?
>>>
>>> On Mon, Dec 22, 2014 at 7:58 PM, Kai Wang  wrote:
>>>
>>>> Ryan,
>>>>
>>>> Actually after I made the change, I was able to connect to C* from host
>>>> but not from guest anymore. Is this expected?
>>>>
>>>> On Mon, Dec 22, 2014 at 8:53 PM, Kai Wang  wrote:
>>>>
>>>>> Ryan,
>>>>>
>>>>> it works! I saw this new config mentioned in Cassandra summit 2014 but
>>>>> didn't realize it applied in my case.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Mon, Dec 22, 2014 at 4:43 PM, Ryan Svihla 
>>>>> wrote:
>>>>>
>>>>>> what is rpc_address set to in cassandra.yaml? my gut is localhost,
>>>>>> set it to the interface that communicates between host and guest.
>>>>>>
>>>>>> On Mon, Dec 22, 2014 at 3:38 PM, Kai Wang  wrote:
>>>>>>
>>>>>>> I installed C* in virtualbox via vagrant. Both 9160 and 9042 ports
>>>>>>> are forwarded from guest to host. I can telnet to those two ports from 
>>>>>>> host
>>>>>>> to guest. But from my host, I can't connect to C* using cassandra-cli or
>>>>>>> cqlsh. My host is Windows 7 64bit and guest is CentOS 6.5.
>>>>>>>
>>>>>>> Is there anything special about connecting to a C* instance inside
>>>>>>> virtualbox?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>
>>>>>> Ryan Svihla
>>>>>>
>>>>>> Solution Architect
>>>>>>
>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>
>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>> DataStax
>>>>>> is the database technology and transactional backbone of choice for the
>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>> eBay.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Ryan Svihla
>>>
>>> Solution Architect
>>>
>>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>>
>>
>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Connect to C* instance inside virtualbox

2014-12-22 Thread Ryan Svihla

totally depends on how the implementation is handled in virtualbox, I'm
assuming you're connecting to an IP that makes sense on the guest (ie
nodetool -h 192.168.1.100 and cqlsh 192.168.1.100, replace that ip with
whatever what you expect)?

On Mon, Dec 22, 2014 at 7:58 PM, Kai Wang  wrote:

> Ryan,
>
> Actually after I made the change, I was able to connect to C* from host
> but not from guest anymore. Is this expected?
>
> On Mon, Dec 22, 2014 at 8:53 PM, Kai Wang  wrote:
>
>> Ryan,
>>
>> it works! I saw this new config mentioned in Cassandra summit 2014 but
>> didn't realize it applied in my case.
>>
>> Thanks.
>>
>> On Mon, Dec 22, 2014 at 4:43 PM, Ryan Svihla 
>> wrote:
>>
>>> what is rpc_address set to in cassandra.yaml? my gut is localhost, set
>>> it to the interface that communicates between host and guest.
>>>
>>> On Mon, Dec 22, 2014 at 3:38 PM, Kai Wang  wrote:
>>>
>>>> I installed C* in virtualbox via vagrant. Both 9160 and 9042 ports are
>>>> forwarded from guest to host. I can telnet to those two ports from host to
>>>> guest. But from my host, I can't connect to C* using cassandra-cli or
>>>> cqlsh. My host is Windows 7 64bit and guest is CentOS 6.5.
>>>>
>>>> Is there anything special about connecting to a C* instance inside
>>>> virtualbox?
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Ryan Svihla
>>>
>>> Solution Architect
>>>
>>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>>
>>
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Connect to C* instance inside virtualbox

2014-12-22 Thread Ryan Svihla

what is rpc_address set to in cassandra.yaml? my gut is localhost, set it
to the interface that communicates between host and guest.

On Mon, Dec 22, 2014 at 3:38 PM, Kai Wang  wrote:

> I installed C* in virtualbox via vagrant. Both 9160 and 9042 ports are
> forwarded from guest to host. I can telnet to those two ports from host to
> guest. But from my host, I can't connect to C* using cassandra-cli or
> cqlsh. My host is Windows 7 64bit and guest is CentOS 6.5.
>
> Is there anything special about connecting to a C* instance inside
> virtualbox?
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: CF performance suddenly degraded

2014-12-22 Thread Ryan Svihla

There can be many root causes. Would need a lot more information such as
node hardware specs, cf histograms on the table, tpstats,GC settings (Max
heap, parnew, JVM version) and logs with specifically any ERROR, WARN, or
GCInspector messages

As a start a simple trace of the query in question is probably good enough.
On Dec 22, 2014 10:59 AM, "Paolo Crosato" 
wrote:

> Hi,
>
> I declared this CF:
>
> CREATE TABLE timesliceunitstate (
>   day timestamp,
>   unitbuckets text,
>   PRIMARY KEY (day)
> );
>
> unitbuckets is a text column holding a fairly big amount of data, around
> 30 MB of json text per row.
>
> The table is holding 30 rows, I'm running cassandra 2.0.8 on a 3 nodes
> cluster with replication factor of 3, consistency of reads is quorum, so 2
> out of 3 nodes.
>
> The table has a write hit about every 20 minutes, which updates only one
> row, the most recent.
>
> I had no problem with read queries (I query the table one row at time)
> until this morning, when read latency jumped from around 300ms to 20
> seconds for each query.
>
> I tried repairing the table on all the 3 nodes using range repair without
> success, the *Data.db file on the disk is aorund 30 MB, so a bit less than
> 1MB for each row.
>
> I'm using latest version of datastax driver, 2.1. I changed nothing on the
> application level since days, so it's not something related to the
> applicacation or the driver.
>
> Is there anyway I can troubleshoot the issue and discover what's making
> the table so slow?
>
> Thanks for any advice,
>
> Paolo
>
> --
> Paolo Crosato
> Software engineer/Custom Solutions
> e-mail: paolo.cros...@targaubiest.com
>
>

Re: Multi DC informations (sync)

2014-12-22 Thread Ryan Svihla

In effect you're saying "I require data centers to be consistent at write
time except when they can't". Basically you've gotten the worst of both
worlds and bad performance during healthy times and less than desired
consistency during unhealthy times.

I believe you may have some misconceptions about availability. If you're
doing fall back..you may as well just use that fall back level the point of
a consistency level is to specify what CL you need to have your app be
happy. You will in effect in practice be using LOCAL_QUORUM.

If you can consider an app "good" at LOCAL_QUORUM in a fall back scenario
you may as well _always_ use LOCAL_QUORUM and then whatever mode or
tradeoff you're making in that disconnected state will be what you want to
use anyway.

Dropped mutations and HH are as close to what you're used to when it comes
to "replication lag" but Cassandra is not Oracle RAC, there is no
background replication service copying data between data centers. There are
writes, and retries of failed writes, and at most "repairs" of inconsistent
datasets

To answer your question about Netflix as they talk about their usage in
public, I'm certain they monitor dropped mutations and HH, and a number of
other handy things like heap usage, node health, load, among a number of
other cluster health issues.

Any and all applications have to design with some idea of the tradeoff
between data center level consistency and global consistency, global
consistency is going to be more expensive, and in effect less available,
but there are CERTAINLY use cases that will not require this.

Example, I want to log click of a link in a webpage?  Does that need
each_quorum? probably not, but let's say I want to change a password? does
that require each_quorum? if it's a security issue, yes almost certainly,
should I accept that change if all datacenters aren't up? probably not,
would I want to fall back? I don't know you tell me.

May I suggest you design any application with the same thought process I
discuss above, and come to grips with monitoring your cluster for health,
and then designing your application to be behave in an expected fashion in
the same way during either  healthy times and bad.

On Mon, Dec 22, 2014 at 10:14 AM, Alain RODRIGUEZ 
wrote:
>
> @Jonathan. I read my previous message. GC grace period is 10 days
> (default) not 10 sec, my bad. Repairs are run every 7 days. I should be
> fine regarding this.
>
> @Ryan
>
> Indeed I might want to use Each_Quorum with a customised fallback to
> local_quorum + alerting in case of partition (like a whole cluster down).
> Our writes are not blocking and we might use this to allow a Each_Quorum.
>
> I am going to discuss this internally and think about it.
>
> Though, I am still intrigued about how big companies like Netflix or Apple
> use and monitor their multiDC env. I can imagine that each_quorum is often
> not acceptable. In this case, I am curious to know how to make sur you're
> always synced (maybe alerting on dropped messages or HH indeed).
>
> Thanks for the information and for your patience :).
>
> See you around.
>
> 2014-12-19 20:35 GMT+01:00 Jonathan Haddad :
>
>> Your gc grace should be longer than your repair schedule.  You're likely
>>  going to have deleted data resurface.
>>
>>
>> On Fri Dec 19 2014 at 8:31:13 AM Alain RODRIGUEZ 
>> wrote:
>>
>>> All that you said match the idea I had of how it works except this part:
>>>
>>> "The request blocks however until all CL is satisfied" --> Does this
>>> mean that the client will see an error if the local DC write the data
>>> correctly (i.e. CL reached) but the remote DC fails ? This is not the idea
>>> I had of something asynchronous...
>>>
>>> If it doesn't fail on client side (real asynchronous), is there a way to
>>> make sure remote DC has indeed received the information ? I mean if the
>>> throughput cross regions is to small, the write will fail and so will the
>>> HH, potentially. How to detect we are lacking of throughput cross DC for
>>> example ?
>>>
>>> Repairs are indeed a good thing (we run them as a weekly routine, GC
>>> grace period 10 sec), but having inconsistency for a week without knowing
>>> it is quite an issue.
>>>
>>> Thanks for this detailed information Ryan, I hope I am clear enough
>>> while expressing my doubts.
>>>
>>> C*heers
>>>
>>> Alain
>>>
>>> 2014-12-19 15:43 GMT+01:00 Ryan Svihla :
>>>>
>>>> More accurately,the write path of Cassandra in a multi dc sense is
>>>> kinda like the following
&g

Re: installing cassandra

2014-12-22 Thread Ryan Svihla

TLDR
Can I suggest as a good middle road to at least use something like csshx
https://code.google.com/p/csshx/ ?

Details

Having worked with a huge variety of skill sets and cluster sizes, I'd
argue "it depends" a lot on team skills, especially when problems occur.

Point being even with small clusters when you're figuring out how to setup
and tune nodes people frequently get it wrong the first few times (almost
invariably not following the docs), and then are trying to fix problems in
prod while not making another mistake in the process, under pressure. This
is a wonderfully combination to screw up even more.

I lean towards tools to automate things (classic bash scripting in my case)
but I was synchronizing configuration 15 years ago, and not doing things by
hand even then.

On Mon, Dec 22, 2014 at 9:44 AM, Eric Stevens  wrote:
>
> If you're just trying to get your feet wet with distributed software, and
> your node count is going to be reasonably low and won't grow any time soon,
> it's probably easier to just install it yourself rather than trying to also
> learn how to use software deployment technologies like puppet or chef.
> Those aren't easier until you are operating at a scale where you need to be
> able to automate adding new nodes.
>
>
> On Sun, Dec 21, 2014, 8:05 AM Ryan Svihla  wrote:
>
>> Puppet, Chef, Ansible and I'm sure many others. I've personally worked
>> with a number of people on all three, a quick google for "Puppet Cassandra"
>> will give you a large number of examples and modules just for Puppet and
>> Cassandra.
>>
>> On Sat, Dec 20, 2014 at 2:01 PM, Adaryl "Bob" Wakefield, MBA <
>> adaryl.wakefi...@hotmail.com> wrote:
>>>
>>>   I have a three node cluster that I’m using to learn how to work with
>>> disturbed software. There is this thing called Puppet that helps you with
>>> deploying software. Can/should I use Puppet to install Cassandra on my
>>> cluster or is there some sort of built in network wide deployment in the
>>> install process already?
>>>
>>> B.
>>>
>>
>>
>> --
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: Store counter with non-counter column in the same column family?

2014-12-22 Thread Ryan Svihla

You can cheat it by using the non counter column as part of your primary
key (clustering column specifically) but the cases where this could work
are limited and the places this is a good idea are even more rare.

As for using counters in batches are already a not well regarded concept
and counter batches have a number of troubling behaviors, as already stated
increments aren't idempotent and batch implies retry.

As for DSE search its doing something drastically different internally and
the type of counting its doing is many orders of magnitude faster ( think
bitmask style matching + proper async 2i to minimize fanout cost)

Generally speaking counting accurately while being highly available creates
an interesting set of logical tradeoffs. Example what do you do if you're
not able to communicate between two data centers, but both are up and
serving "likes" quite happily? Is your counting down? Do you keep counting
but serve up different answers? More accurately since problems are rarely
data center to data center but more frequently between replicas, how much
availability are you willing to give up in exchange for a globally accurate
count?
On Dec 22, 2014 6:00 AM, "DuyHai Doan"  wrote:

> It's not possible to mix counter and non counter columns because currently
> the semantic of counter is only increment/decrement (thus NOT idempotent)
> and requires some special handling compared to other C* columns.
>
> On Mon, Dec 22, 2014 at 11:33 AM, ziju feng  wrote:
>
>> I was wondering if there is plan to allow creating counter column and
>> standard column in the same table.
>>
>> Here is my use case:
>> I want to use counter to count how many users like a given item in my
>> application. The like count needs to be returned along with details of item
>> in query. To support querying items in different ways, I use both
>> application-maintained denormalized index tables and DSE search for
>> indexing. (DSE search is also used for text searching)
>>
>> Since current counter implementation doesn't allow having counter columns
>> and non-counter columns in the same table, I have to propagate the current
>> count from counter table to the main item table and index tables, so that
>> like counts can be returned by those index tables without sending extra
>> requests to counter table and DSE search is able to build index on like
>> count column in the main item table to support like count related queries
>> (such as sorting by like count).
>>
>> IMHO, the only way to sync data between counter table and normal table
>> within a reasonable time (sub-seconds) currently is to read the current
>> value from counter table right after the update. However it suffers from
>> several issues:
>> 1. Read-after-write may not return the correct count when replication
>> factor > 1 unless consistency level ALL/LOCAL_ALL is used
>> 2. There are two extra non-parallelizable round-trips between the
>> application server and cassandra, which can have great impact on
>> performance.
>>
>> If it is possible to store counter in standard column family, only one
>> write will be needed to update like count in the main table. Counter value
>> will also be eventually synced between replicas so that there is no need
>> for application to use extra mechanism like scheduled task to get the
>> correct counts.
>>
>> A related issue is lifting the limitation of not allowing updating
>> counter columns and normal columns in one batch, since it is quite common
>> to not only have a counter for statistics but also store the details, such
>> as storing the relation of which user likes which items in my user case.
>>
>> Any idea?
>>
>>
>

Re: Replacing nodes disks

2014-12-21 Thread Ryan Svihla

Cassandra is designed to rebuild a node from other nodes, whether a node is
dead by your hand because you killed it or fate is irrelevant, the process
is the same, a "new node" can be the same hostname and ip or it can have
totally different ones.

On Sun, Dec 21, 2014 at 6:01 AM, Or Sher  wrote:
>
> If I'll use the replace_address parameter with the same IP address, would
> that do the job?
>
> On Sun, Dec 21, 2014 at 11:20 AM, Or Sher  wrote:
>
>> What I want to do is kind of replacing a dead node -
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
>> But replacing it with a clean node with the same IP and hostname.
>>
>> On Sun, Dec 21, 2014 at 9:53 AM, Or Sher  wrote:
>>
>>> Thanks guys.
>>> I have to replace all data disks, so I don't have another large enough
>>> local disk to move the data to.
>>> If I'll have no choice, I will backup the data before on some other node
>>> or something, but I'd like to avoid it.
>>> I would really love letting Cassandra do it thing and rebuild itself.
>>> Did anybody handled such cases that way (Letting Cassandra rebuild it's
>>> data?)
>>> Although there are no documented procedure for it, It should be possible
>>> right?
>>>
>>> On Fri, Dec 19, 2014 at 8:41 AM, Jan Kesten 
>>> wrote:
>>>
>>>> Hi Or,
>>>>
>>>> I did some sort of this a while ago. If your machines do have a free
>>>> disk slot - just put another disk there and use it as another
>>>> data_file_directory.
>>>>
>>>> If not - as in my case:
>>>>
>>>> - grab an usb dock for disks
>>>> - put the new one in there, plug in, format, mount to /mnt etc.
>>>> - I did an online rsync from /var/lib/cassandra/data to /mnt
>>>> - after that, bring cassandra down
>>>> - do another rsync from /var/lib/cassandra/data to /mnt (should be
>>>> faster, as sstables do not change, minimizes downtime)
>>>> - if you need adjust /etc/fstab if needed
>>>> - shutdown the node
>>>> - swap disks
>>>> - power on the node
>>>> - everything should be fine ;-)
>>>>
>>>> Of course you will need a replication factor > 1 for this to work ;-)
>>>>
>>>> Just my 2 cents,
>>>> Jan
>>>>
>>>> rsync the full contents there,
>>>>
>>>> Am 18.12.2014 um 16:17 schrieb Or Sher:
>>>>
>>>>  Hi all,
>>>>>
>>>>> We have a situation where some of our nodes have smaller disks and we
>>>>> would like to align all nodes by replacing the smaller disks to bigger 
>>>>> ones
>>>>> without replacing nodes.
>>>>> We don't have enough space to put data on / disk and copy it back to
>>>>> the bigger disks so we would like to rebuild the nodes data from other
>>>>> replicas.
>>>>>
>>>>> What do you think should be the procedure here?
>>>>>
>>>>> I'm guessing it should be something like this but I'm pretty sure it's
>>>>> not enough.
>>>>> 1. shutdown C* node and server.
>>>>> 2. replace disks + create the same vg lv etc.
>>>>> 3. start C* (Normally?)
>>>>> 4. nodetool repair/rebuild?
>>>>> *I think I might get some consistency issues for use cases relying on
>>>>> Quorum reads and writes for strong consistency.
>>>>> What do you say?
>>>>>
>>>>> Another question is (and I know it depends on many factors but I'd
>>>>> like to hear an experienced estimation): How much time would take to
>>>>> rebuild a 250G data node?
>>>>>
>>>>> Thanks in advance,
>>>>> Or.
>>>>>
>>>>> --
>>>>> Or Sher
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Or Sher
>>>
>>
>>
>>
>> --
>> Or Sher
>>
>
>
>
> --
> Or Sher
>


-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

Re: installing cassandra

2014-12-21 Thread Ryan Svihla

Puppet, Chef, Ansible and I'm sure many others. I've personally worked with
a number of people on all three, a quick google for "Puppet Cassandra" will
give you a large number of examples and modules just for Puppet and
Cassandra.

On Sat, Dec 20, 2014 at 2:01 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefi...@hotmail.com> wrote:
>
>   I have a three node cluster that I’m using to learn how to work with
> disturbed software. There is this thing called Puppet that helps you with
> deploying software. Can/should I use Puppet to install Cassandra on my
> cluster or is there some sort of built in network wide deployment in the
> install process already?
>
> B.
>

-- 

[image: datastax_logo.png] <http://www.datastax.com/>

Ryan Svihla

Solution Architect

[image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
<http://www.linkedin.com/pub/ryan-svihla/12/621/727/>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

1 2 >

1 - 100 of 147 matches

Mail list logo