LOCAL_QUORUM vs EACH_QUORUM

2012-11-01 Thread Yang
the following comment in the code describes them very clearly:

*   LOCAL_QUORUM Returns the record with the most recent timestamp once a
majority of replicas within the local datacenter have replied.
 *   EACH_QUORUM  Returns the record with the most recent timestamp once a
majority of replicas within each datacenter have replied.


but it seems that my intended use case is not solved by either policy:
I have 2 colos,  mostly I want to run my application in the primary-backup
(or hot/warm ) mode, though everything is automated, and a human switch
over is not needed in case of one colo failure. I want all writes/reads to
get a quorum from local colo,  then at least make sure that 1 write has
propagated to the other colo.

So I do not necessarily need a Quorum from remote colos, but I need at
least one write to arrive there.

does that sound like a common use case? within the current code, is there a
way to achieve that? if not, creating a new policy does not seem too
difficult either.

Thanks
Yang


Re: Data migration between clusters

2012-11-01 Thread 張 睿

Hi Rob,

Thank you for your reply.
Our scenario is like this, we have 3 clusters, each has 1 or 2 keyspaces 
in it,

and each cluster has 3 nodes.
Now we're considering integrating these 3 clusters of 9 nodes to a 
single cluster of 9 nodes.
This new cluster will contain all keyspaces and their data the former 3 
clusters have.
The replication factor, which is 3 now, will not be changed during this 
migration.
We tried using sstableloader which didn't work well. Maybe we did it in 
a wrong way.
It looks like the way of migrating data you suggested would solve our 
problem,

we'll try it out by refering the link you gave in your mail.

Thanks a lot again for your precious information,
Ray

(12/11/01 2:43), Rob Coli wrote:

On Tue, Oct 30, 2012 at 4:18 AM, 張 睿 chou...@cyberagent.co.jp wrote:

Does anyone here know if there is an efficient way to migrate multiple
cassandra clusters' data
to a single cassandra cluster without any dataloss.

Yes.

1) create schema which is superset of all columnfamilies and all keyspaces
2) if all source clusters were the same fixed number of nodes, create
a new cluster with the same fixed number of nodes
3) nodetool drain and shut down all nodes on all participating clusters
4) copy sstables from old clusters, maintaining that data from source
node [x] ends up on target node [x]
5) start cassandra

However without more details as to your old clusters, new clusters,
and availability requirements, I can't give you a more useful answer.

Here's some background on bulk loading, including copy-the-sstables.

http://palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

=Rob



-- Ray Zhang Cyberagent.co



Re: idea drive layout - 4 drives + RAID question

2012-11-01 Thread Ran User
Thanks.  Yep, I think OS + CL (2 drive RAID1) will provide the best balance
of reduced headaches / performance.  I'll also be pondering 1 drive OS, 1
drive CL as well.
On Wed, Oct 31, 2012 at 9:27 PM, aaron morton aa...@thelastpickle.comwrote:

 Good question.

 The is a comment on the DS blog or docs somewhere that says on EC2 running
 the commit log on the raid-0 ephemeral is preferred. I think the
 recommendation was specifically about how the disks are setup on EC2.

 While the commit log will be competing with logs and everything else on
 the OS volume, it would be competing with C* reads, Memtable flushing,
 compacting and repairing on the data volume.

 The only way to be sure is to test both setups.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/10/2012, at 1:11 PM, Ran User ranuse...@gmail.com wrote:

 Is there a concern of a large falloff in commit log write performance
 (sequential) when sharing 2 drives (RAID 1) with the OS (os and services
 writing their own logs, etc)?  Do you expect the hit to be marginal?


 On Tue, Oct 30, 2012 at 7:58 PM, aaron morton aa...@thelastpickle.comwrote:

 We also have 4-disk nodes, and we use the following layout:
 2 x OS + Commit in RAID 1
 2 x Data disk in RAID 0

 +1

 You are replicating data at the application level and want the fastest
 possible IO performance per node.

  You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 There are some features coming in 1.2 that make using a JBOD setup
 easier.

 Cheers

  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/10/2012, at 9:23 PM, Pieter Callewaert 
 pieter.callewa...@be-mobile.be wrote:

 We also have 4-disk nodes, and we use the following layout:
 2 x OS + Commit in RAID 1
 2 x Data disk in RAID 0

 This gives us the advantage we never have to reinstall the node when a
 drive crashes.

 Kind regards,
 Pieter


 *From:* Ran User [mailto:ranuse...@gmail.com]
 *Sent:* dinsdag 30 oktober 2012 4:33
 *To:* user@cassandra.apache.org
 *Subject:* Re: idea drive layout - 4 drives + RAID question

 Have you considered running RAID 10 for the data drives to improve MTBF?
 
  
 On one hand Cassandra is handling redundancy issues, on the other
 hand, reducing the frequency of dealing with failed nodes
 is attractive if cheap (switching RAID levels to 10). 
  

 We have no experience with software RAID (have always used hardware raid
 with BBU).  I'm assuming software RAID 1 or 10 (the mirroring part) is
 inherently reliable (perhaps minus some edge case).
 On Tue, Oct 30, 2012 at 1:07 AM, Tupshin Harper tups...@tupshin.com
 wrote:

 I would generally recommend 1 drive for OS and commit log and 3 drive
 raid 0 for data. The raid does give you good performance benefit, and it
 can be convenient to have the OS on a side drive for configuration ease and
 better MTBF.

 -Tupshin
 On Oct 29, 2012 8:56 PM, Ran User ranuse...@gmail.com wrote:
 I was hoping to achieve approx. 2x IO (write and read) performance via
 RAID 0 (by accepting a higher MTBF).
  
 Do believe the performance gains of RAID0 are much lower and/or are not
 worth it vs the increased server failure rate?
  
 From my understanding, RAID 10 would achieve the read performance
 benefits of RAID 0, but not the write benefits.  I'm also considering RAID
 10 to maximize server IO performance. 
  
 Currently, we're working with 1 CF.
  
  

 Thank you
 On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.com
 wrote:
 I'm not sure whether the raid 0 gets you anything other than headaches
 should one of the drives fail. You can already distribute the
 individual Cassandra column families on different drives by just
 setting up symlinks to the individual folders.

 2012/10/30 Ran User ranuse...@gmail.com:
  For a server with 4 drive slots only, I'm thinking:
 
  either:
 
  - OS (1 drive)
  - Commit Log (1 drive)
  - Data (2 drives, software raid 0)
 
  vs
 
  - OS  + Data (3 drives, software raid 0)
  - Commit Log (1 drive)
 
  or something else?
 
  also, if I can spare the wasted storage, would RAID 10 for cassandra
 data
  improve read performance and have no effect on write performance?
 
  Thank you!
 ** **







Re: Cassandra upgrade issues...

2012-11-01 Thread Sylvain Lebresne
The first thing I would check is if nodetool is using the right jar. I
sounds a lot like if the server has been correctly updated but
nodetool haven't and still use the old classes.
Check the nodetool executable, it's a shell script, and try echoing
the CLASSPATH in there and check it correctly point to what it should.

--
Sylvain

On Thu, Nov 1, 2012 at 9:10 AM, Brian Fleming bigbrianflem...@gmail.com wrote:
 Hi,



 I was testing upgrading from Cassandra v.1.0.7 to v.1.1.5 yesterday on a
 single node dev cluster with ~6.5GB of data  it went smoothly in that no
 errors were thrown, the data was migrated to the new directory structure, I
 can still read/write data as expected, etc.  However nodetool commands are
 behaving strangely – full details below.



 I couldn’t find anything relevant online relating to these exceptions – any
 help/pointers would be greatly appreciated.



 Thanks  Regards,



 Brian









 ‘nodetool cleanup’ runs successfully



 ‘nodetool info’ produces :



 Token: 82358484304664259547357526550084691083

 Gossip active: true

 Load : 7.69 GB

 Generation No: 1351697611

 Uptime (seconds) : 58387

 Heap Memory (MB) : 936.91 / 1928.00

 Exception in thread main java.lang.ClassCastException: java.lang.String
 cannot be cast to org.apache.cassandra.dht.Token

 at
 org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:546)

 at
 org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:559)

 at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:313)

 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:651)



 ‘nodetool repair’ produces :

 Exception in thread main java.lang.reflect.UndeclaredThrowableException

 at $Proxy0.forceTableRepair(Unknown Source)

 at
 org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:203)

 at
 org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:880)

 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:719)

 Caused by: javax.management.ReflectionException: Signature mismatch for
 operation forceTableRepair: (java.lang.String, [Ljava.lang.String;) should
 be (java.lang.String, boolean, [Ljava.lang.String;)

 at
 com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:152)

 at
 com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:117)

 at
 com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)

 at
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)

 at
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)

 at
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)

 at
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)

 at
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)

 at
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)

 at
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at
 sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)

 at sun.rmi.transport.Transport$1.run(Transport.java:159)

 at java.security.AccessController.doPrivileged(Native Method)

 at sun.rmi.transport.Transport.serviceCall(Transport.java:155)

 at
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)

 at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)

 at
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

 at java.lang.Thread.run(Thread.java:662)

 at
 sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255)

 at
 sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233)

 at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142)

 at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)

 at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown
 Source)

 at
 javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993)

 at
 

Re: Cassandra upgrade issues...

2012-11-01 Thread Brian Fleming
Hi Sylvain,

Simple as that!!!  Using the 1.1.5 nodetool version works as expected.  My
mistake.

Many thanks,

Brian



On Thu, Nov 1, 2012 at 8:24 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 The first thing I would check is if nodetool is using the right jar. I
 sounds a lot like if the server has been correctly updated but
 nodetool haven't and still use the old classes.
 Check the nodetool executable, it's a shell script, and try echoing
 the CLASSPATH in there and check it correctly point to what it should.

 --
 Sylvain

 On Thu, Nov 1, 2012 at 9:10 AM, Brian Fleming bigbrianflem...@gmail.com
 wrote:
  Hi,
 
 
 
  I was testing upgrading from Cassandra v.1.0.7 to v.1.1.5 yesterday on a
  single node dev cluster with ~6.5GB of data  it went smoothly in that no
  errors were thrown, the data was migrated to the new directory
 structure, I
  can still read/write data as expected, etc.  However nodetool commands
 are
  behaving strangely – full details below.
 
 
 
  I couldn’t find anything relevant online relating to these exceptions –
 any
  help/pointers would be greatly appreciated.
 
 
 
  Thanks  Regards,
 
 
 
  Brian
 
 
 
 
 
 
 
 
 
  ‘nodetool cleanup’ runs successfully
 
 
 
  ‘nodetool info’ produces :
 
 
 
  Token: 82358484304664259547357526550084691083
 
  Gossip active: true
 
  Load : 7.69 GB
 
  Generation No: 1351697611
 
  Uptime (seconds) : 58387
 
  Heap Memory (MB) : 936.91 / 1928.00
 
  Exception in thread main java.lang.ClassCastException: java.lang.String
  cannot be cast to org.apache.cassandra.dht.Token
 
  at
  org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:546)
 
  at
  org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:559)
 
  at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:313)
 
  at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:651)
 
 
 
  ‘nodetool repair’ produces :
 
  Exception in thread main java.lang.reflect.UndeclaredThrowableException
 
  at $Proxy0.forceTableRepair(Unknown Source)
 
  at
  org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:203)
 
  at
  org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:880)
 
  at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:719)
 
  Caused by: javax.management.ReflectionException: Signature mismatch for
  operation forceTableRepair: (java.lang.String, [Ljava.lang.String;)
 should
  be (java.lang.String, boolean, [Ljava.lang.String;)
 
  at
  com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:152)
 
  at
  com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:117)
 
  at
  com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
 
  at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 
  at
  com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at
  sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
 
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
 
  at java.security.AccessController.doPrivileged(Native Method)
 
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
 
  at
  sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
 
  at
 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 
  at
 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
 
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 
  at java.lang.Thread.run(Thread.java:662)
 
  at
 
 sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255)
 
  at
  sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233)
 
  at 

Re: Benifits by adding nodes to the cluster

2012-11-01 Thread aaron morton
I've not run it myself, but upgrading is part of the design. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/11/2012, at 10:43 AM, Wei Zhu wz1...@yahoo.com wrote:

 I heard about virtual nodes. But it doesn't come out until 1.2. Is it easy to 
 convert the existing installation to use virtual nodes?
 
 Thanks.
 -Wei 
 
 From: aaron morton aa...@thelastpickle.com
 To: user@cassandra.apache.org 
 Sent: Wednesday, October 31, 2012 2:23 PM
 Subject: Re: Benifits by adding nodes to the cluster
 
 I have been told that it's much easier to scale the cluster by doubling the
 number of nodes, since no token changed needed on the existing nodes.
 Yup.
 
 But if the number of nodes is substantial, it's not realistic to double it
 every time.
 See the keynote from Jonathan Ellis or the talk on Virtual Nodes from Sam 
 here 
 http://www.datastax.com/events/cassandrasummit2012/presentations
 
 virtual nodes make this sort of thing faster and easier
 
 
 How easy is to add let's say 3 additional nodes to the existing
 10 nodes?
 In that scenario would would need to move every node. 
 But if you have 10 nodes you probably don't want to scale up by 3, I would 
 guess 5 or 10. Scaling is not something you want to do every day. 
 
 How easy the process is depends on the level of automation in your 
 environment. For example Ops Centre can automate rebalancing nodes. 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/10/2012, at 7:14 AM, weiz wz1...@yahoo.com wrote:
 
 One follow up questions.
 I have been told that it's much easier to scale the cluster by doubling the
 number of nodes, since no token changed needed on the existing nodes.
 But if the number of nodes is substantial, it's not realistic to double it
 every time. How easy is to add let's say 3 additional nodes to the existing
 10 nodes? I understand the process of moving around data and delete unused
 data. Just want to understand from the operational point of view, how
 difficult is that? We are in the processing of evaluating the nosql
 solution, one important consideration is the operation cost. Any real world
 experience is very much appreciated.
 
 Thanks.
 -Wei 
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Benifits-by-adding-nodes-to-the-cluster-tp7583437p7583466.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.
 
 
 



Re: Multiple counters value after restart

2012-11-01 Thread aaron morton
 What CL are you using ?
 
 I think this can be what causes the issue. I'm writing and reading at CL ONE. 
 I didn't drain before stopping Cassandra and this may have produce a fail in 
 the current counters (those which were being written when I stopped a server).
My first thought is to use QUOURM. But with only two nodes it's hard to get 
strong consistency using  QUOURM.  
Can you try it thought, or run a repair ? 

 But isn't Cassandra suppose to handle a server crash ? When a server crashes 
 I guess it don't drain before...

I was asking to understand how you did the upgrade. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/11/2012, at 11:39 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 What version of cassandra are you using ?
 
 1.1.2
 
 Can you explain this further?
 
 I had an unexplained amount of reads (up to 1800 r/s and 90 Mo/s) on one 
 server the other was doing about 200 r/s and 5 Mo/s max. I fixed it by 
 rebooting the server. This server is dedicated to cassandra. I can't tell you 
 more about it 'cause I don't get it... But a simple Cassandra restart wasn't 
 enough.
 
 Was something writing to the cluster ?
 
 Yes we are having some activity and perform about 600 w/s.
 
 Did you drain for the upgrade ?
 
 We upgrade a long time ago and to 1.1.2. This warning is about the version 
 1.1.6.
 
 What changes did you make ?
 
 In the cassandra.yaml I just change the compaction_throughput_mb_per_sec 
 property to slow down my compaction a bit. I don't think the problem come 
 from here.
 
 Are you saying that a particular counter column is giving different values 
 for different reads ?
 
 Yes, this is exactly what I was saying. Sorry if something is wrong with my 
 English, it's not my mother tongue.
 
 What CL are you using ?
 
 I think this can be what causes the issue. I'm writing and reading at CL ONE. 
 I didn't drain before stopping Cassandra and this may have produce a fail in 
 the current counters (those which were being written when I stopped a server).
 
 But isn't Cassandra suppose to handle a server crash ? When a server crashes 
 I guess it don't drain before...
 
 Thank you for your time Aaron, once again.
 
 Alain
 
 
 
 2012/10/31 aaron morton aa...@thelastpickle.com
 What version of cassandra are you using ?
 
  I finally restart Cassandra. It didn't solve the problem so I stopped 
 Cassandra again on that node and restart my ec2 server. This solved the 
 issue (1800 r/s to 100 r/s).
 Can you explain this further?
 Was something writing to the cluster ?
 Did you drain for the upgrade ? 
 https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt#L17
 
 Today I changed my cassandra.yml and restart this same server to apply my 
 conf.
 
 What changes did you make ?
 
 I just noticed that my homepage (which uses a Cassandra counter and 
 refreshes every sec) shows me 4 different values. 2 of them repeatedly (5000 
 and 4000) and the 2 other some rare times (5500 and 3800)
 Are you saying that a particular counter column is giving different values 
 for different reads ? 
 What CL are you using ?
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/10/2012, at 3:39 AM, Jason Wee peich...@gmail.com wrote:
 
 maybe enable the debug in log4j-server.properties and going through the log 
 to see what actually happen?
 
 On Tue, Oct 30, 2012 at 7:31 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 Hi, 
 
 I have an issue with counters, yesterday I had a lot of ununderstandable 
 reads/sec on one server. I finally restart Cassandra. It didn't solve the 
 problem so I stopped Cassandra again on that node and restart my ec2 server. 
 This solved the issue (1800 r/s to 100 r/s).
 
 Today I changed my cassandra.yml and restart this same server to apply my 
 conf.
 
 I just noticed that my homepage (which uses a Cassandra counter and 
 refreshes every sec) shows me 4 different values. 2 of them repeatedly (5000 
 and 4000) and the 2 other some rare times (5500 and 3800)
 
 Only the counters made today and yesterday are concerned.
 
 I performed a repair without success. These data are the heart of our 
 business so if someone had any clue on it, I would be really grateful...
 
 The sooner the better, I am in production with these random counters.
 
 Alain
 
 INFO:
 
 My environnement is 2 nodes (EC2 large), RF 2, CL.ONE (R  W), Random 
 Partitioner.
 
 xxx.xxx.xxx.241eu-west 1b  Up Normal  151.95 GB   
 50.00%  0
 xxx.xxx.xxx.109eu-west 1b  Up Normal  117.71 GB   
 50.00%  85070591730234615865843651857942052864
 
 Here is my conf: http://pastebin.com/5cMuBKDt
 
 
 
 
 



Re: repair, compaction, and tombstone rows

2012-11-01 Thread Sylvain Lebresne
 Is this a feature or a bug?

Neither really. Repair doesn't do any gcable tombstone collection and
it would be really hard to change that (besides, it's not his job). So
if you when you run repair there is sstable with tombstone that could
be collected but are not yet, then yes, they will be streamed. Now the
theory is that compaction will run often enough that gcable tombstone
will be collected in a reasonably timely fashion and so you will never
have lots of such tombstones in general (making the fact that repair
stream them largely irrelevant). That being said, in practice, I don't
doubt that there is a few scenario like your own where this still can
lead to doing too much useless work.

I believe the main problem is that size tiered compaction has a
tendency to not compact the largest sstables very often. Meaning that
you could have large sstable with mostly gcable tombstone sitting
around. In the upcoming Cassandra 1.2,
https://issues.apache.org/jira/browse/CASSANDRA-3442 will fix that.
Until then, if you are no afraid of a little bit of scripting, one
option could be before running a repair to run a small script that
would check the creation time of your sstable. If an sstable is old
enough (for some value of that that depends on what is the TTL you use
on all your columns), you may want to force a compaction (using the
JMX call forceUserDefinedCompaction()) of that sstable. The goal being
to get read of a maximum of outdated tombstones before running the
repair (you could also alternatively run a major compaction prior to
the repair, but major compactions have a lot of nasty effect so I
wouldn't recommend that a priori).

--
Sylvain


Re: Multiple counters value after restart

2012-11-01 Thread Alain RODRIGUEZ
Can you try it thought, or run a repair ?

Repairing didn't help

My first thought is to use QUOURM

This fix the problem. However, my data is probably still inconsistent, even
if I read now always the same value. The point is that I can't handle a
crash with CL.QUORUM, I can't even restart a node...

I will add a third server.

  But isn't Cassandra suppose to handle a server crash ? When a server
crashes I guess it don't drain before...

I was asking to understand how you did the upgrade.

Ok. On my side I am just concern about the possibility of using counters
with CL.ONE and correctly handle a crash or restart without a drain.

Alain



2012/11/1 aaron morton aa...@thelastpickle.com

 What CL are you using ?

 I think this can be what causes the issue. I'm writing and reading at CL
 ONE. I didn't drain before stopping Cassandra and this may have produce a
 fail in the current counters (those which were being written when I stopped
 a server).

 My first thought is to use QUOURM. But with only two nodes it's hard to
 get strong consistency using  QUOURM.
 Can you try it thought, or run a repair ?

 But isn't Cassandra suppose to handle a server crash ? When a server
 crashes I guess it don't drain before...

 I was asking to understand how you did the upgrade.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 1/11/2012, at 11:39 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 What version of cassandra are you using ?

 1.1.2

 Can you explain this further?

 I had an unexplained amount of reads (up to 1800 r/s and 90 Mo/s) on one
 server the other was doing about 200 r/s and 5 Mo/s max. I fixed it by
 rebooting the server. This server is dedicated to cassandra. I can't tell
 you more about it 'cause I don't get it... But a simple Cassandra restart
 wasn't enough.

 Was something writing to the cluster ?

 Yes we are having some activity and perform about 600 w/s.

 Did you drain for the upgrade ?

 We upgrade a long time ago and to 1.1.2. This warning is about the version
 1.1.6.

 What changes did you make ?

 In the cassandra.yaml I just change the compaction_throughput_mb_per_sec
 property to slow down my compaction a bit. I don't think the problem come
 from here.

 Are you saying that a particular counter column is giving different
 values for different reads ?

 Yes, this is exactly what I was saying. Sorry if something is wrong with
 my English, it's not my mother tongue.

 What CL are you using ?

 I think this can be what causes the issue. I'm writing and reading at CL
 ONE. I didn't drain before stopping Cassandra and this may have produce a
 fail in the current counters (those which were being written when I stopped
 a server).

 But isn't Cassandra suppose to handle a server crash ? When a server
 crashes I guess it don't drain before...

 Thank you for your time Aaron, once again.

 Alain



 2012/10/31 aaron morton aa...@thelastpickle.com

 What version of cassandra are you using ?

  I finally restart Cassandra. It didn't solve the problem so I stopped
 Cassandra again on that node and restart my ec2 server. This solved the
 issue (1800 r/s to 100 r/s).

 Can you explain this further?
 Was something writing to the cluster ?
 Did you drain for the upgrade ?
 https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt#L17

 Today I changed my cassandra.yml and restart this same server to apply my
 conf.

 What changes did you make ?

 I just noticed that my homepage (which uses a Cassandra counter and
 refreshes every sec) shows me 4 different values. 2 of them repeatedly
 (5000 and 4000) and the 2 other some rare times (5500 and 3800)

 Are you saying that a particular counter column is giving different
 values for different reads ?
 What CL are you using ?

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/10/2012, at 3:39 AM, Jason Wee peich...@gmail.com wrote:

 maybe enable the debug in log4j-server.properties and going through the
 log to see what actually happen?

 On Tue, Oct 30, 2012 at 7:31 PM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi,

 I have an issue with counters, yesterday I had a lot of ununderstandable
 reads/sec on one server. I finally restart Cassandra. It didn't solve the
 problem so I stopped Cassandra again on that node and restart my ec2
 server. This solved the issue (1800 r/s to 100 r/s).

 Today I changed my cassandra.yml and restart this same server to apply
 my conf.

 I just noticed that my homepage (which uses a Cassandra counter and
 refreshes every sec) shows me 4 different values. 2 of them repeatedly
 (5000 and 4000) and the 2 other some rare times (5500 and 3800)

 Only the counters made today and yesterday are concerned.

 I performed a repair without success. These data are the heart of our
 business so if someone had any clue on it, I would be really grateful...

 The sooner the better, I am in production with 

logging servers? any interesting in one for cassandra?

2012-11-01 Thread Hiller, Dean
2 questions

 1.  What are people using for logging servers for their web tier logging?
 2.  Would anyone be interested in a new logging server(any programming 
language) for web tier to log to your existing cassandra(it uses up disk space 
in proportion to number of web servers and just has a rolling window of logs 
along with a window of threshold dumps)?

Context for second question: I like less systems since it is less 
maintenance/operations cost and so yesterday I quickly wrote up some log back 
appenders which support (SLF4J/log4j/jdk/commons libraries) and send the logs 
from our client tier into cassandra.  It is simply a rolling window of logs so 
the space used in cassandra is proportional to the amount of web  servers I 
have(currently, I have 4 web servers).  I am also thinking about adding warning 
type logging such that on warning, the last N logs info and above are flushed 
along with the warning so basically two rolling windows.  Then in the GUI, it 
simply shows the logs and if you click on a session, it switches to a view with 
all the logs for that session(no matter which server since in our cluster the 
session switches servers on every request since we are stateless….our session 
id is in the cookie).

Well, let me know if anyone is interested and would actually use such a thing 
and if so, we might create a server around it.

Thanks,
Dean


Re: High bandwidth usage between datacenters for cluster

2012-11-01 Thread B. Todd Burruss
bryce, did you resolve this?  i'm interested in the outcome.

when you write does it help to use CL = LOCAL_QUORUM?

On Mon, Oct 29, 2012 at 12:52 AM, aaron morton aa...@thelastpickle.com wrote:
 Outbound messages for other DC's are grouped and a single instance is sent
 to a single node in the remote DC. The remote node then forwards the message
 on to the other recipients in it's DC. All remote DC nodes will however
 reply directly to the coordinator.

 Normally this isn’t an issue for us, but at times we are writing
 approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic
 across the WAN to all the Cassandra DR servers.

 Can you break the traffic down by port and direction ?

 Cheers



 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 28/10/2012, at 12:18 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote:

 Network topology with the topology file filled out is already the
 configuration we are using.

 From: sankalp kohli [mailto:kohlisank...@gmail.com]
 Sent: Thursday, October 25, 2012 11:55 AM
 To: user@cassandra.apache.org
 Subject: Re: High bandwidth usage between datacenters for cluster

 Use placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the
 topology.properties file. This will tell cassandra that you have two DCs.
 You can verify that by looking at output of the ring command.

 If you DCs are setup properly, only one request will go over WAN. Though the
 responses from all nodes in other DC will go over WAN.

 On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey bryce.godf...@azaleos.com
 wrote:

 We have a 5 node cluster, with a matching 5 nodes for DR in another data
 center.   With a replication factor of 3, does the node I send a write too
 attempt to send it to the 3 servers in the DR also?  Or does it send it to 1
 and let it replicate locally in the DR environment to save bandwidth across
 the WAN?
 Normally this isn’t an issue for us, but at times we are writing
 approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic
 across the WAN to all the Cassandra DR servers.

 If my assumptions are right, is this configurable somehow for writing to one
 node and letting it do local replication?  We are on 1.1.5

 Thanks




Diagnosing a row caching problem

2012-11-01 Thread Bryan

Having a problem diagnosing an issue with row caching. It seems like row 
caching is not working (very few items stored), despite it being enabled, using 
JNA, and the key cache being super hot.  I assume I'm missing something 
obvious, but I would expect to have more items stored in the row cache, even 
after updates. Is there something in the logs I should look for? Are writes 
invalidating the cache?

Here are the stats:

keys: 100 bytes, values: 200 - 400 bytes, with read and write pattern

Single node, Cassandra 1.1.5, test node

CFSTATS

Keyspace: omitted
Read Count: 15967248
Read Latency: 0.4956699828924809 ms.
Write Count: 15967240
Write Latency: 0.027375880803445052 ms.
Pending Tasks: 0
Column Family:  omitted
SSTable count: 75
Space used (live): 705364536
Space used (total): 705364536
Number of Keys (estimate): 2591104
Memtable Columns Count: 267840
Memtable Data Size: 129949276
Memtable Switch Count: 192
Read Count: 15967248
Read Latency: NaN ms.
Write Count: 15967240
Write Latency: NaN ms.
Pending Tasks: 0
Bloom Filter False Postives: 281
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 3986944
Compacted row minimum size: 311
Compacted row maximum size: 642
Compacted row mean size: 419


 INFO

Load : 676.5 MB
Heap Memory (MB) : 1476.10 / 2925.00
Key Cache: size 104857584 (bytes), capacity 104857584 (bytes), 6146639 
hits, 14680951 requests, 1.000 recent hit rate, 14400 save period in seconds
Row Cache  : size 0 (bytes), capacity 209715200 (bytes), 47 hits, 14400100 
requests, NaN recent hit rate, 87000 save period in seconds



Hello

2012-11-01 Thread davukovi
 Hello,

 My name is Davor Vuković. I am a Student on a Specialist Professional
 Graduate Study of Information Science and Technology in Business Systems
 in Croatia. I was wondering if you could help me a bit regarding Database
 Management in Cassandra? I would be very happy if you could explain me
 these terms regarding the Cassandra DBMS (how is it done in Cassandra):

  - renewal procedures of database
  - optimization of database in the DBMS
  - optimization of the DBMS
  - using Codd's rules in the DBMS
  - views, triggers, stored procedures
  - relation constraints within the scheme and the scheme between the
 database

 Thank you so much and have a nice day!

 Davor Vuković





Re: cassandra 1.0.10 : Bootstrapping 7 node cluster to 14 nodes

2012-11-01 Thread Brennan Saeta
The other nodes all have copies of the same data. To optimize performance,
all of them stream different parts of the data, even though 102 has all the
data that 108 needs. (I think. I'm not an expert.) -Brennan


On Thu, Nov 1, 2012 at 9:31 AM, Ramesh Natarajan rames...@gmail.com wrote:

 I am trying to bootstrap cassandra 1.0.10 cluster of 7 nodes to 14 nodes.

 My seed nodes are 101, 102, 103 and 104.

 Here is my initial ring

 Address DC  RackStatus State   Load
  OwnsToken

  145835300108973627198589117470757804908
 192.168.1.101   datacenter1 rack1   Up Normal  8.16 GB
 14.29%  0
 192.168.1.102   datacenter1 rack1   Up Normal  8.68 GB
 14.29%  24305883351495604533098186245126300818
 192.168.1.103   datacenter1 rack1   Up Normal  8.45 GB
 14.29%  48611766702991209066196372490252601636
 192.168.1.104   datacenter1 rack1   Up Normal  8.16 GB
 14.29%  72917650054486813599294558735378902454
 192.168.1.105   datacenter1 rack1   Up Normal  8.33 GB
 14.29%  97223533405982418132392744980505203272
 192.168.1.106   datacenter1 rack1   Up Normal  8.71 GB
 14.29%  121529416757478022665490931225631504090
 192.168.1.107   datacenter1 rack1   Up Normal  8.41 GB
 14.29%  145835300108973627198589117470757804908

 I add a new node 108 with the initial_token between 101 and 102.  After I
 start bootstrapping, I see the node is placed in the ring in correct place

 Address DC  RackStatus State   Load
  OwnsToken

  145835300108973627198589117470757804908
 192.168.1.101   datacenter1 rack1   Up Normal  8.16 GB
 14.29%  0
 192.168.1.108   datacenter1 rack1   Up Joining 114.61 KB
 7.14%   12152941675747802266549093122563150409
 192.168.1.102   datacenter1 rack1   Up Normal  8.68 GB
 7.14%   24305883351495604533098186245126300818
 192.168.1.103   datacenter1 rack1   Up Normal  8.4 GB
  14.29%  48611766702991209066196372490252601636
 192.168.1.104   datacenter1 rack1   Up Normal  8.15 GB
 14.29%  72917650054486813599294558735378902454
 192.168.1.105   datacenter1 rack1   Up Normal  8.33 GB
 14.29%  97223533405982418132392744980505203272
 192.168.1.106   datacenter1 rack1   Up Normal  8.71 GB
 14.29%  121529416757478022665490931225631504090
 192.168.1.107   datacenter1 rack1   Up Normal  8.41 GB
 14.29%  145835300108973627198589117470757804908

 What puzzles me is when I look at the netstats I see nodes 107,104 and 103
 are streaming data to 108.   Can someone explain why this happens?  I was
 under the impression that only node 102 needs to split the tokens and send
 to 108. Am I missing something?


 Streaming from: /192.168.1.107
 Streaming from: /192.168.1.104
 Streaming from: /192.168.1.103


 Thanks
 Ramesh








Re: repair, compaction, and tombstone rows

2012-11-01 Thread Rob Coli
On Thu, Nov 1, 2012 at 1:43 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 on all your columns), you may want to force a compaction (using the
 JMX call forceUserDefinedCompaction()) of that sstable. The goal being
 to get read of a maximum of outdated tombstones before running the
 repair (you could also alternatively run a major compaction prior to
 the repair, but major compactions have a lot of nasty effect so I
 wouldn't recommend that a priori).

If sstablesplit (reverse compaction) existed, major compaction would
be a simple solution to this case. You'd major compact and then split
your One Giant SSTable With No Tombstones into a number of smaller
ones. :)

https://issues.apache.org/jira/browse/CASSANDRA-4766

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: repair, compaction, and tombstone rows

2012-11-01 Thread Bryan Talbot
It seems like CASSANDRA-3442 might be an effective fix for this issue
assuming that I'm reading it correctly.  It sounds like the intent is to
automatically compact SSTables when a certain percent of the columns are
gcable by being deleted or with expired tombstones.  Is my understanding
correct?

Would such tables be compacted individually (1-1) or are several eligible
tables selected and compacted using the STCS compaction threshold bounds?

-Bryan


On Thu, Nov 1, 2012 at 9:43 AM, Rob Coli rc...@palominodb.com wrote:

 On Thu, Nov 1, 2012 at 1:43 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:
  on all your columns), you may want to force a compaction (using the
  JMX call forceUserDefinedCompaction()) of that sstable. The goal being
  to get read of a maximum of outdated tombstones before running the
  repair (you could also alternatively run a major compaction prior to
  the repair, but major compactions have a lot of nasty effect so I
  wouldn't recommend that a priori).

 If sstablesplit (reverse compaction) existed, major compaction would
 be a simple solution to this case. You'd major compact and then split
 your One Giant SSTable With No Tombstones into a number of smaller
 ones. :)

 https://issues.apache.org/jira/browse/CASSANDRA-4766

 =Rob

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb



Re: Cassandra upgrade issues...

2012-11-01 Thread Bryan Talbot
Note that 1.0.7 came out before 1.1 and I know there were
some compatibility issues that were fixed in later 1.0.x releases which
could affect your upgrade.  I think it would be best to first upgrade to
the latest 1.0.x release, and then upgrade to 1.1.x from there.

-Bryan



On Thu, Nov 1, 2012 at 1:27 AM, Brian Fleming bigbrianflem...@gmail.comwrote:

 Hi Sylvain,

 Simple as that!!!  Using the 1.1.5 nodetool version works as expected.  My
 mistake.

 Many thanks,

 Brian




 On Thu, Nov 1, 2012 at 8:24 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 The first thing I would check is if nodetool is using the right jar. I
 sounds a lot like if the server has been correctly updated but
 nodetool haven't and still use the old classes.
 Check the nodetool executable, it's a shell script, and try echoing
 the CLASSPATH in there and check it correctly point to what it should.

 --
 Sylvain

 On Thu, Nov 1, 2012 at 9:10 AM, Brian Fleming bigbrianflem...@gmail.com
 wrote:
  Hi,
 
 
 
  I was testing upgrading from Cassandra v.1.0.7 to v.1.1.5 yesterday on a
  single node dev cluster with ~6.5GB of data  it went smoothly in that
 no
  errors were thrown, the data was migrated to the new directory
 structure, I
  can still read/write data as expected, etc.  However nodetool commands
 are
  behaving strangely – full details below.
 
 
 
  I couldn’t find anything relevant online relating to these exceptions –
 any
  help/pointers would be greatly appreciated.
 
 
 
  Thanks  Regards,
 
 
 
  Brian
 
 
 
 
 
 
 
 
 
  ‘nodetool cleanup’ runs successfully
 
 
 
  ‘nodetool info’ produces :
 
 
 
  Token: 82358484304664259547357526550084691083
 
  Gossip active: true
 
  Load : 7.69 GB
 
  Generation No: 1351697611
 
  Uptime (seconds) : 58387
 
  Heap Memory (MB) : 936.91 / 1928.00
 
  Exception in thread main java.lang.ClassCastException:
 java.lang.String
  cannot be cast to org.apache.cassandra.dht.Token
 
  at
  org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:546)
 
  at
  org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:559)
 
  at
 org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:313)
 
  at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:651)
 
 
 
  ‘nodetool repair’ produces :
 
  Exception in thread main
 java.lang.reflect.UndeclaredThrowableException
 
  at $Proxy0.forceTableRepair(Unknown Source)
 
  at
 
 org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:203)
 
  at
  org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:880)
 
  at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:719)
 
  Caused by: javax.management.ReflectionException: Signature mismatch for
  operation forceTableRepair: (java.lang.String, [Ljava.lang.String;)
 should
  be (java.lang.String, boolean, [Ljava.lang.String;)
 
  at
  com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:152)
 
  at
  com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:117)
 
  at
  com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
 
  at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 
  at
  com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
 
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at
  sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)
 
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
 
  at java.security.AccessController.doPrivileged(Native Method)
 
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
 
  at
  sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
 
  at
 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 
  at
 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
 
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
  at
 
 

Re: Is it bad putting columns with composite or integer name in CF with ByteType comparator validator ?

2012-11-01 Thread Ertio Lew
Thoughts, please ?


On Thu, Nov 1, 2012 at 7:12 PM, Ertio Lew ertio...@gmail.com wrote:

 Would that do any harm or are there any downsides, if I store columns with
 composite names or Integer type names in a column family with bytesType
 comparator  validator. I have observed that bytesType comparator would
 also sort the integer named columns in similar fashion as done by
 IntegerType comparator, so why should I just lock my CF to just store
 Integer or composite named columns, would be good if I could just mix
 different datatypes in same column family, No !?


Re: distribution of token ranges with virtual nodes

2012-11-01 Thread Manu Zhang
 it will migrate you to virtual nodes by splitting the existing partition
 256 ways.


Out of curiosity, is it for the purpose of avoiding streaming?

 the former would require you to perform a shuffle to achieve that.


Is there a nodetool option or are there other ways shuffle could be done
automatically?


On Thu, Nov 1, 2012 at 2:17 AM, Eric Evans eev...@acunu.com wrote:

 On Wed, Oct 31, 2012 at 11:38 AM, John Sanda john.sa...@gmail.com wrote:
  Can/should i assume that i will get even range distribution or close to
 it with random
  token selection?

 The short answer is: If you're using virtual nodes, random token
 selection will give you even range distribution.

 The somewhat longer answer is that this is really a function of the
 total number of tokens.  The more randomly generated tokens a cluster
 has, the more distribution will even out.  The reason this can work
 for virtual nodes where it has not for the older 1-token-per-node
 model is because (assuming a reasonable num_tokens value), virtual
 nodes gives you a much higher token count for a given number of nodes.

 That wiki page you cite wasn't really intended to be documentation
 (expect some of that soon though), but what that section was trying to
 convey was that while random distribution is quite good, it may not be
 100% perfect, especially when the number of nodes is low (remember,
 the number of tokens scales with the number of nodes).  I think this
 is (or may be) a problem for some.  If you're forced to manually
 calculate tokens then you are quite naturally going to calculate a
 perfect distribution, and if you've grown accustomed to this, seeing
 the ownership values off by a few percent could really bring out your
 inner OCD. :)

  For the sake of discussion, what is a reasonable default to start
  with for num_tokens assuming nodes are homogenous? That wiki page
 mentions a
  default of 256 which I see commented out in cassandra.yaml; however,
  Config.num_tokens is set to 1.

 The (unconfigured )default is 1.  That is to say that virtual nodes is
 not enabled.  The current recommendation when setting this,
 (documented in the config) is 256.

  Maybe I missed where the default of 256 is
  used. From some initial testing though, it looks like 1 token per node is
  being used. Using defaults in cassandra.yaml, I see this in my logs,

 Right.  And it's worth noting that if you uncomment num_tokens *after*
 starting a node with it commented (i.e. num_tokens: 1), then it will
 migrate you to virtual nodes by splitting the existing partition 256
 ways.  This is *not* the equivalent of starting a node with num_tokens
 = 256 for the first time.  The latter would leave you with randomized
 placement, the former would require you to perform a shuffle to
 achieve that.



 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu



Re: distribution of token ranges with virtual nodes

2012-11-01 Thread Brandon Williams
On Thu, Nov 1, 2012 at 10:05 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 it will migrate you to virtual nodes by splitting the existing partition
 256 ways.


 Out of curiosity, is it for the purpose of avoiding streaming?

It splits into a contiguous range, because truly upgrading to vnode
functionality is another step.


  the former would require you to perform a shuffle to achieve that.


 Is there a nodetool option or are there other ways shuffle could be done
 automatically?

There a shuffle command in bin/ that was recently committed, we'll
document this in process in NEWS.txt shortly.

-Brandon


Re: distribution of token ranges with virtual nodes

2012-11-01 Thread Manu Zhang

 It splits into a contiguous range, because truly upgrading to vnode 
 functionality
 is another step.

That confuses me. As I understand it, there is no point in having 256
tokens on same node if I don't commit the shuffle


On Fri, Nov 2, 2012 at 11:10 AM, Brandon Williams dri...@gmail.com wrote:

 On Thu, Nov 1, 2012 at 10:05 PM, Manu Zhang owenzhang1...@gmail.com
 wrote:
 
  it will migrate you to virtual nodes by splitting the existing partition
  256 ways.
 
 
  Out of curiosity, is it for the purpose of avoiding streaming?

 It splits into a contiguous range, because truly upgrading to vnode
 functionality is another step.

 
   the former would require you to perform a shuffle to achieve that.
 
 
  Is there a nodetool option or are there other ways shuffle could be
 done
  automatically?

 There a shuffle command in bin/ that was recently committed, we'll
 document this in process in NEWS.txt shortly.

 -Brandon